Facet with large number of unigram entries

2012-12-04 Thread Andreas Niekler

Dear List,

i have an index with 2.000.000 articles. All those texts get tokenized 
while indexing. On this data i run a faceted query like this (to receive 
associated words):


select?q=a_spell:{some 
word}facet.method=enumfacet=truefacet.field=Paragraphfacet.limit=10facet.prefix={some 
prefix}facet.mincount=1500indent=1fl=_idwt=jsonrows=0'



I have more than 5.000.000 unique token in the index and the facet query 
is quite slow. I also tried differnt FastLRUcache  Settings in the 
FilterCache.


Has Anybody a hint how i could improve performance within this setup?

Thnak you all

--
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: aniek...@informatik.uni-leipzig.deg.de


Re: Backing up SolR 4.0

2012-12-04 Thread Andy D'Arcy Jewell

On 03/12/12 18:04, Shawn Heisey wrote:


Serious production Solr installs require at least two copies of your 
index.  Failures *will* happen, and sometimes they'll be the kind of 
failures that will take down an entire machine.  You can plan for some 
failures -- redundant power supply and RAID are important for this.  
Some failures will cause downtime, though -- multiple disk failures, 
motherboard, CPU, memory, software problems wiping out your index, 
user error, etc.If you have at least one other copy of your index, 
you'll be able to keep the system operational while you fix the down 
machine.


Replication is a very good way to accomplish getting two or more 
copies of your index.  I would expect that most production Solr 
installations use either plain replication or SolrCloud.  I do my 
redundancy a different way that gives me a lot more flexibility, but 
replication is a VERY solid way to go.


If you are running on a UNIX/Linux platform (just about anything 
*other* than Windows), and backups via replication are not enough for 
you, you can use the hardlink capability in the OS to avoid taking 
Solr down while you make backups.  Here's the basic sequence:


1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target 
directory.

4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation.  The way that 
Solr/Lucene works will guarantee that your hardlink copy will continue 
to be a valid index snapshot no matter what happens to the live 
index.  If you can make the backup and get the hardlinks deleted 
before your index undergoes a merge, the hardlinks will use very 
little extra disk space.


If you leave the hardlink copies around, eventually your live index 
will diverge to the point where the copy has different files and 
therefore takes up disk space.  If you have a *LOT* of extra disk 
space on the Solr server, you can keep multiple hardlink copies around 
as snapshots.


Recent versions of Windows do have features similar to UNIX links, so 
there may in fact be a way to do this on Windows.  I will leave that 
for someone else to pursue.


Thanks,
Shawn

Thanks Shawn, that's very informative. I get twitchy with anything where 
you can't back it up (memcached excepted). As an administrator, it's 
my job to recover from failures, and backups are kind of my comfort blanket.


I'm running on Linux (on Debian Squeeze) in a fully virtual 
environment.  Initially, I think I'll have to just schedule the backup 
for the early hours (local time) but as we grow, I can see I'll have to 
use replication to do it seamlessly. The system is necessarily small 
right now, as we haven't yet gone live, butwe are anticipating rapid 
growth, so replication has always been on the cards.


Is there an easy way to tell (say from a shell script) when all commits 
and merges [are] complete?


If I keep a replica solely for backup purposes, I assume I can do what 
I like with it - presumably replication will resume/catch-up when I 
resume it (I admit, I have a bit of reading to do wrt replication - I 
just skimmed that because it wasn't in my initial brief).


I'm assuming that because you're using hardlinks, that means that SolR 
writes a new file when it updates (sortof copy-on-write style)? So we 
are relying on the principle that as long as you have at least one 
remaining reference to the data, it's not deleted...


Thanks once again!

-Andy



--
Andy D'Arcy Jewell

SysMicro Limited
Linux Support
E:  andy.jew...@sysmicro.co.uk
W:  www.sysmicro.co.uk



Re: Whole Phrase search in Solr

2012-12-04 Thread NickA
Hello Jack,

You are the man!

Indeed, this was the problem. We tried several combinations and we thought
that we did that too, but somehow we failed to see that your proposal was
working! Don't know why, maybe we had something else changed in parallel,
don't know.

So, THANK YOU, you have been a great support!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024196.html
Sent from the Solr - User mailing list archive at Nabble.com.


replication of files when index is stable/static (SOLR-1304?)

2012-12-04 Thread Fredrik Rødland
I have a static index with config-files changing frequently.

Until now I've distributes these files to all solr-hosts in my current setup 
manually, but I'm wondering if I can get SOLR to do this using the 
config-replication.

Searching google I've come across 
https://issues.apache.org/jira/browse/SOLR-1304.

Anyone know if there is any work done to this issue, or if there are other 
work-arounds?

Adding (followed by a deletion) of a dummy document seems to trigger the 
replication, but this is hardly the best solution.

Regards,

Fredrik Rodland


--
Fredrik Rødland   Mail:fred...@rodland.no
  Cell:+47 99 21 98 17
  Twitter: @fredrikr
Maisen Pedersens vei 1Flickr:  http://www.flickr.com/fmmr/
NO-1363 Høvik, NORWAY Web: http://about.me/fmr



SQL DIH - Can I have some guidance please?

2012-12-04 Thread Spadez
Hi.

I am having a bit of trouble figuruing out the DIH for SQL files. I have
asked around a few different places but havent got any replies so I was
hoping you could help me.

*I have a database schema like this:*

CREATE TABLE company (
id SERIAL PRIMARY KEY,
name varchar(60) NOT NULL
);

CREATE TABLE country (
id SERIAL PRIMARY KEY,
name varchar(255) NOT NULL
);

CREATE TABLE location (
id SERIAL PRIMARY KEY,
name varchar(255) NOT NULL,
coordinate varchar(255) NOT NULL,
location_id integer NOT NULL REFERENCES country (id)
);

CREATE TABLE source (
id SERIAL PRIMARY KEY,
name varchar(60) NOT NULL
);

CREATE TABLE item (
id SERIAL PRIMARY KEY,
title varchar(60) NOT NULL,
description varchar(900) NOT NULL,
company_id integer NOT NULL REFERENCES company (id),
date timestamp NOT NULL,
source_id integer NOT NULL REFERENCES source (id),
link varchar(255) NOT NULL,
location_id integer NOT NULL REFERENCES location (id)
);

*My what I want to put into my schema is this information (named as they are
in my schema):*

id
title
description
date
source
link
location_name
location_coordinates

*I made my DIH like this:*

dataConfig
dataSource name=app-ds driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia
password=secret /
document
entity dataSource=app-ds name=application query=SELECT id,
page_title from page
field column=id name=id /
field column=title name=title /
field column=description name=description /
field column=date name=date /
field column=name name=source /
field column=link name=link /
field column=name name=location_name /
field column=coordinate name=location_coordinates 
/
/entity
/document
/dataConfig

My main questions relate to the entity datastore query and also what to do
for field columns when it is a linked table. For example the word name
isnt unique since it appears in several different tables.

I would really appreciate any help on this, its taken me a while to get to
this stage and now I am truely stuck.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to change Solr UI

2012-12-04 Thread Erik Hatcher
It's a shame wt=velocity gets a bad rap because /update isn't out of the box 
strict with the HTTP/RESTful scene.  A delete should be a DELETE of some sort.

There are 3rd party standalone apps.  There was even a standalone ruby app 
(flare) that was once upon a time in Solr's svn, but really the Solr committers 
can't be expected to maintain all those various examples and keep them up to 
date and working, so best to keep them 3rd party IMO.  We've got Blacklight, 
VuFind, and all sorts of other front-ends out there with their own vibrant 
communities.

I'm -1 for removing VW (it's contrib plugin as it is already, just like 
/update/extract).  /browse certainly could use a cleaning up / revamping, but 
it's good stuff if I do say so myself and very handy to have available for 
several reasons*.

Let's try not to conflate wt=velocity with /update being more easily dangerous 
than it probably should be.  But let's also be clear always that Solr is meant 
to be behind the firewall as it's primary and default place in the world. 

Erik

* One I'll share: There is a real-world use case of a (relatively big) company 
using wt=velocity to generate e-mail (for saved searches) texts very 
conveniently in a backend environment and very high speed, no other 
technologies/complexities needed in the mix but Solr and a little custom 
templating. 

On Dec 3, 2012, at 20:58 , Jack Krupansky wrote:

 It is annoying to have to repeat these explanations so much.
 
 Any serious objection to removing the VW UI from Solr proper and replacing it 
 with a standalone app?
 
 I mean, Solr should have PHP, python, Java, and ruby example apps, right?
 
 -- Jack Krupansky
 
 -Original Message- From: Iwan Hanjoyo
 Sent: Monday, December 03, 2012 8:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to change Solr UI
 
 
 
 Note that Velocity _can_ be used for user-facing code, but be very sure you
 secure your Solr. If you allow direct access, a user can easily enter
 something like http://
 solr/update?commit=truestream.body=deletequery*:*/query/delete.
 And all your documents will be gone.
 
 Hi Erickson,
 
 Thank you for the input.
 I'll notice and filter out this url.
 * http://
 solr/update?commit=truestream.body=deletequery*:*/query/delete
 
 Kind regards,
 
 Hanjoyo 



Re: SOLR4 cluster - strange CPU spike on slave

2012-12-04 Thread John Nielsen
Success!

I tried adding -XX:+UseConcMarkSweepGC to java to make it GC earlier. We
haven't seen any spikes since.

I'm cautiously optimistic though and will be monitoring the servers for a
week or so before declaring final victory.

The post about mmapdirectory is really interesting. We switched to using
that from NRTCachingDirectory and am monitoring performance as well.
Initially performance doesn't look stellar, but i suspect that we lack
memory in the server to really make it shine.


Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Nov 30, 2012 at 3:13 PM, Erick Erickson erickerick...@gmail.comwrote:

 right, so here's what I'd check for.

 Your logs should show a replication pretty coincident with the spike and
 that should be in the log. Note: the replication should complete just
 before the spike.

 Or you can just turn replication off and fire it manually to try to force
 the situation at will, see:
 http://wiki.apache.org/solr/SolrReplication#HTTP_API. (but note that
 you'll
 have to wait until the index has changed on the master to see any action).

 So you should be able to create your spike at will. And this will be pretty
 normal. When replication happens, a new searcher is opened, caches are
 filled, autowarming is done, all kinds of stuff like that. During this
 period, the _old_ searcher is still open, which will both cause the CPU to
 be busier and require additional memory. Once the new searcher is warmed,
 new queries go to it, and when the old searcher has finished serving all
 the queries it shuts down and all the resources are freed. Which is why
 commits are expensive operations.

 All of which means that so far I don't think there's a problem, this is
 just normal Solr operation. If you're seeing responsiveness problems when
 serving queries you probably want to throw more hardware (particularly
 memory) at the problem.

 But when thinking about memory allocating to the JVM, _really_ read Uwe's
 post here:
 http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

 Best
 Erick


 On Thu, Nov 29, 2012 at 2:39 AM, John Nielsen j...@mcb.dk wrote:

  Yup you read it right.
 
  We originally intended to do all our indexing to varnish02, replicate to
  varnish01 and then search from varnish01 (through a fail-over ip which
  would switch the reader to varnish02 in case of trouble).
 
  When I saw the spikes, I tried to eliminate possibilities by starting
  searching from varnish02, leaving varnish01 with nothing to do but to
  receive replication data. This did not remove the spikes. As soon as this
  spike is fixed, I will start searching from varnish01 again. These sort
 of
  debug antics are only possible because, although we do have customers
 using
  this, we are still in our beta phase.
 
  Varnish01 never receives any manual commit orders. Varnish02 does from
 time
  to time.
 
  Oh, and I accidentally misinformed you before. (damn secondary language)
 We
  are actually seeing the spikes on both servers. I was just focusing on
  varnish01 because I use it to eliminate possibilities.
 
  It just occurred to me now; We tried switching off our feeder/index tool
  for 24 hours, and we didn't see any spikes during this period, so
 receiving
  replication data certainly has something to do with it.
 
  Med venlig hilsen / Best regards
 
  *John Nielsen*
  Programmer
 
 
 
  *MCB A/S*
  Enghaven 15
  DK-7500 Holstebro
 
  Kundeservice: +45 9610 2824
  p...@mcb.dk
  www.mcb.dk
 
 
 
  On Thu, Nov 29, 2012 at 3:20 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Am I reading this right? All you're doing on varnish1 is replicating to
  it?
   You're not searching or indexing? I'm sure I'm misreading this.
  
  
   The spike, which only lasts for a couple of minutes, sends the disks
   racing This _sounds_ suspiciously like segment merging, especially the
   disks racing bit. Or possibly replication. Neither of which make much
   sense. But is there any chance that somehow multiple commits are being
   issued? Of course if varnish1 is a slave, that shouldn't be happening
   either.
  
   And the whole bit about nothing going to the logs is just bizarre. I'm
   tempted to claim hardware gremlins, especially if you see nothing
 similar
   on varnish2. Or some other process is pegging the machine. All of which
  is
   a way of saying I have no idea
  
   Yours in bewilderment,
   Erick
  
  
  
   On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen j...@mcb.dk wrote:
  
I apologize for the late reply.
   
The query load is more or less stable during the spikes. There are
  always
fluctuations, but nothing on the order of magnitude that could
 explain
   this
spike. In fact, the latest spike occured last night when there were
   almost
noone using it.
   
To test a hunch of mine, I tried to deactivate all caches by
 commenting
   

Two databases merge into SOLR - How to keep unique ID?

2012-12-04 Thread Spadez
I have two databases (unfortunately they do have to be separate) which get
imported into Solr.

Each database has a primary key for each time but I am concerned that when
it comes to importing the two into SOLR there will be more than one item
with the same ID (one from each DB).

Therefore, in order to keep two separate databases with two different
uniqueIDs, I was wondering what kind of options I have.

Should I append a letter to the primary key of one DB. Is this even
possible?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Two-databases-merge-into-SOLR-How-to-keep-unique-ID-tp4024217.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SQL DIH - Can I have some guidance please?

2012-12-04 Thread Gora Mohanty
On 04/12/2012, Spadez james_will...@hotmail.com wrote:
 Hi.

 I am having a bit of trouble figuruing out the DIH for SQL files. I have
 asked around a few different places but havent got any replies so I was
 hoping you could help me.

 *I have a database schema like this:*

 CREATE TABLE company (
 id SERIAL PRIMARY KEY,
 name varchar(60) NOT NULL
 );

 CREATE TABLE country (
 id SERIAL PRIMARY KEY,
 name varchar(255) NOT NULL
 );

 CREATE TABLE location (
 id SERIAL PRIMARY KEY,
 name varchar(255) NOT NULL,
 coordinate varchar(255) NOT NULL,
 location_id integer NOT NULL REFERENCES country (id)
 );

 CREATE TABLE source (
 id SERIAL PRIMARY KEY,
 name varchar(60) NOT NULL
 );

 CREATE TABLE item (
 id SERIAL PRIMARY KEY,
 title varchar(60) NOT NULL,
 description varchar(900) NOT NULL,
 company_id integer NOT NULL REFERENCES company (id),
 date timestamp NOT NULL,
 source_id integer NOT NULL REFERENCES source (id),
 link varchar(255) NOT NULL,
 location_id integer NOT NULL REFERENCES location (id)
 );

 *My what I want to put into my schema is this information (named as they
 are
 in my schema):*

 id
 title
 description
 date
 source
 link
 location_name
 location_coordinates

It is not entirely clear:
(a) How the tables are related. One can guess that item is
 related to source through source_id, and to location through
 location_id
(b) Which of the Solr fields are to be derived from which table.
 In particular, what are the tables, company and country, used
 for.

 *I made my DIH like this:*

The way to deal with related tables is by using nested entities, e.g.,
some of your fields can be populated by

dataConfig
 dataSource name=app-ds driver=org.postgresql.Driver
 url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia
 password=secret /
document
   entity dataSource=app-ds name=item query=SELECT id,
 title, location_id from item
  entity dataSource=app-ds name=location query=SELECT name,
 coordinate from location where location_id=${item.location_id}
field column=id name=id /
field column=title name=title /
field column=name name=location_name /
field column=coordinate name=location_coordinates 
/
/entity
/entity
/document
/dataConfig

You can add more second-level entities, and/or fields as needed. The
${item.location_id} refers to the location_id in the select from the top-
level entity.

 My main questions relate to the entity datastore query and also what to do
 for field columns when it is a linked table. For example the word name
 isnt unique since it appears in several different tables.

You could change the name in the select, e.g.,
for the top-level entity have:
  query=select name as top_level_name
and for the inner entity have:
  query=select name as second_level_name

Regards,
Gora


Re: Two databases merge into SOLR - How to keep unique ID?

2012-12-04 Thread Gora Mohanty
On 04/12/2012, Spadez james_will...@hotmail.com wrote:
 I have two databases (unfortunately they do have to be separate) which get
 imported into Solr.

 Each database has a primary key for each time but I am concerned that when
 it comes to importing the two into SOLR there will be more than one item
 with the same ID (one from each DB).

 Therefore, in order to keep two separate databases with two different
 uniqueIDs, I was wondering what kind of options I have.

 Should I append a letter to the primary key of one DB. Is this even
 possible?

Yes, that sounds like a good plan. You can add this unique string in the
select, or use a simple script transformer for that.

Regards,
Gora


Sorting by multi-valued field

2012-12-04 Thread Thomas Heigl
Hey all!

In our system users can create recurring events and search for events
starting on or after a given date. Searching and filtering of events works
perfectly, but users expect the result set to be ordered by the next start
time.

For each event, we index a multi-valued date field containing all its start
times. The relevant parts of my schema look like this:

- event_id
 - start_times_dts


In SQL I would do something like:

WHERE start_times = %SELECTED_DATE% GROUP BY event_id HAVING
 min(start_times) ORDER BY start_times ASC


The only thing I could think of so far is to index every single start time
of an event as a separate document and group on the event. This would solve
the sorting problem but would drastically increase our index size.

Is there a more elegant way to do this in Solr? A function query or
subquery maybe? I thought about it for quite a while and couldn't come up
with a viable solution.

Cheers,

Thomas


Cannot run Solr4 from Intellij Idea

2012-12-04 Thread Artyom
After 2 days I have figured out how to open Solr 4 in IntelliJ IDEA 11.1.4 on
Tomcat 7. IntelliJ IDEA finds webapp/web/WEB-INF/web.xml and offers to make
a facet from it and adds this facet to the parent module, from which an
artifact can be created.

The problem is that Solr cannot run properly. I get this message:

SEVERE: Unable to create core: mycore
org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer:
Error loading class 'solr.StandardTokenizerFactory'
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650)
at
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
at
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/tokenizer: Error loading class
'solr.StandardTokenizerFactory'
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:344)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 25 more
Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.StandardTokenizerFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 29 more
Caused by: java.lang.ClassNotFoundException: solr.StandardTokenizerFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:420)
... 32 more

I tried to debug it and found it cannot resolve
'solr.StandardTokenizerFactory' because it searches this class inside solr,
when it is inside lucene. I can update my schema.xml and replace all solr
shorname with full class names, but I don't think it is correct, because
Solr runs properly with this schema on Tomcat 7 if run not from Intellij

Re: SQL DIH - Can I have some guidance please?

2012-12-04 Thread Spadez
Thank you so much for your help. Based on the same schema in my first post
and your help I created this, have I implemented it correctly based on your
suggestion? I tried to comment it: 

dataConfig 
 dataSource name=app-ds driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia
password=secret /
document 


entity dataSource=app-ds name=item query=SELECT id, title,
description, date, link, location_id, source_id, company_id from item


entity dataSource=app-ds name=location query=SELECT name, 
coordinate
from location where location_id=${item.location_id} 
entity dataSource=app-ds name=source query=SELECT name 
from source
where source_id=${item.source_id}
entity dataSource=app-ds name=company query=SELECT name 
from company
where company_id=${item.company_id} 


field column=id name=id / 
field column=title name=title /
field column=description name=description /   

field column=date name=date /
field column=link name=link /


field column=name name=company_name /
field column=name name=source_name / 
field column=name name=location_name / 
field column=coordinate name=location_coordinates 
/

/entity 
/entity 
/document 
/dataConfig






--
View this message in context: 
http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to SWAP cores (or collections) with SolrCloud (SOLR-3866)

2012-12-04 Thread Andre Bois-Crettez

Hello,

With solr-4.0.0, the useful SWAP command
http://wiki.apache.org/solr/CoreAdmin#SWAP that allows to have a main
core serving searches, while a temp core can be re-indexed from scratch,
no longer works on SolrCloud, as was discussed here :Solr Swap Function
doesn't work when using Solr Cloud Beta
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201209.mbox/%3ccalb4qrmi_fjpes8onyoeusbk1e51sqscxg6aaeulmvdl0f2...@mail.gmail.com%3E

As we really need the ability to rebuild a complete index behind the
scenes without interrupting the search service, we tried some
workarounds, without success.

Typical setup is :
solr.xml:
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
cores defaultCoreName=core_main adminPath=/admin/cores
zkClientTimeout=${zkClientTimeout:15000} hostPort=8983
hostContext=solr
core shard=shard1 instanceDir=instance_main name=core_main
collection=collection_main dataDir=data/
core shard=shard1 instanceDir=instance_temp name=core_temp
collection=collection_temp dataDir=data/
/cores
/solr

solrconfig.xml etc are all the same for all cores/collections.
We start 1 leader and 1 replica.
Excerpt of tests/workarounds that were tried (please ask if details are
needed) :

1.a switch instanceDir path in solr.xml, set /clusterstate.json {} (to
reset it), restart leader
1.bswitch instanceDir path and core names in solr.xml, set
/clusterstate.json {}, restart leader
- solr leader and replica don't respond anymore when clusterstate.json
is empty
- leader: KeeperErrorCode = NoNode for
/collections/collection_temp/leaders/shard1
- replica:  Could not find collection in zk: collection_main

2.a switch instanceDir path in solr.xml, RELOAD cores / collections
2.b switch instanceDir path and core names in solr.xml, RELOAD cores /
collections
- No visible effect

3. RENAME core_temp to core_main
- query on collection_main returns data initially indexed to core_temp
(what we want),but collection_main is no more in solr.xml and a 404
error is returned for any further document updates to collection_main

4. switch instanceDir directories on the filesystem, using a sequence of
dir moves, RELOAD cores
- 500 error is returned for the replica cores RELOAD, with SEVERE error
in log, seems like solr did not like our messing with lucene files.
(Surprisingly, if there is no replica, this seems to work)

We also tried to use SYNCSHARD collection command and try to manually
set clusterstate.json with a new configuration but these workarounds
don't work.


Now we are running short of ideas to progress further...
It seems that we really need to patch solr to properly implement SWAP
for cores (or collections by the way) in the context of SolrCloud.

So,
* what can we do to help progress on SOLR-3866 ? Maybe use case
scenarios, detailing desired behavior ? Constrains on what cores or
collections are allowed to SWAP, ie. same config, same doc-shard
assignments ?
* our naive idea would be to update the /clusterstate.json config on Zk,
and propagate this configuration change to all solr instances. Is that
even remotely realistic ?

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: How to change Solr UI

2012-12-04 Thread Jack Krupansky

let's also be clear always that Solr is meant to be behind the firewall

Absolutely, but we are NOT doing that when we provide the Velocity-based 
/browse UI.


Erik, your email example sounds reasonable, so if you want to substitute 
something like that for the /browse handler, fine. As you point out, it is 
not Velocity per se, but the /browse UI that results in a lack of clarity 
about Solr being meant to be behind the firewall.


-- Jack Krupansky

-Original Message- 
From: Erik Hatcher

Sent: Tuesday, December 04, 2012 5:23 AM
To: solr-user@lucene.apache.org
Subject: Re: How to change Solr UI

It's a shame wt=velocity gets a bad rap because /update isn't out of the box 
strict with the HTTP/RESTful scene.  A delete should be a DELETE of some 
sort.


There are 3rd party standalone apps.  There was even a standalone ruby app 
(flare) that was once upon a time in Solr's svn, but really the Solr 
committers can't be expected to maintain all those various examples and keep 
them up to date and working, so best to keep them 3rd party IMO.  We've got 
Blacklight, VuFind, and all sorts of other front-ends out there with their 
own vibrant communities.


I'm -1 for removing VW (it's contrib plugin as it is already, just like 
/update/extract).  /browse certainly could use a cleaning up / revamping, 
but it's good stuff if I do say so myself and very handy to have available 
for several reasons*.


Let's try not to conflate wt=velocity with /update being more easily 
dangerous than it probably should be.  But let's also be clear always that 
Solr is meant to be behind the firewall as it's primary and default place in 
the world.


Erik

* One I'll share: There is a real-world use case of a (relatively big) 
company using wt=velocity to generate e-mail (for saved searches) texts very 
conveniently in a backend environment and very high speed, no other 
technologies/complexities needed in the mix but Solr and a little custom 
templating.


On Dec 3, 2012, at 20:58 , Jack Krupansky wrote:


It is annoying to have to repeat these explanations so much.

Any serious objection to removing the VW UI from Solr proper and replacing 
it with a standalone app?


I mean, Solr should have PHP, python, Java, and ruby example apps, right?

-- Jack Krupansky

-Original Message- From: Iwan Hanjoyo
Sent: Monday, December 03, 2012 8:28 PM
To: solr-user@lucene.apache.org
Subject: Re: How to change Solr UI




Note that Velocity _can_ be used for user-facing code, but be very sure 
you

secure your Solr. If you allow direct access, a user can easily enter
something like http://
solr/update?commit=truestream.body=deletequery*:*/query/delete.
And all your documents will be gone.

Hi Erickson,


Thank you for the input.
I'll notice and filter out this url.
* http://
solr/update?commit=truestream.body=deletequery*:*/query/delete

Kind regards,

Hanjoyo 




Re: SQL DIH - Can I have some guidance please?

2012-12-04 Thread Gora Mohanty
On 04/12/2012, Spadez james_will...@hotmail.com wrote:
 Thank you so much for your help. Based on the same schema in my first post
 and your help I created this, have I implemented it correctly based on your
 suggestion? I tried to comment it:

Looks almost correct. You only need two levels of
nesting, and can use proper nesting to differentiate
between the different name attributes.

dataConfig
  dataSource name=app-ds driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia
password=secret /
 document
  entity dataSource=app-ds name=item query=SELECT id,
title, description, date, link, location_id, source_id, company_id
from item
field column=id name=id /
field column=title name=title /
field column=description name=description /   

field column=date name=date /
field column=link name=link /

entity dataSource=app-ds name=location 
query=SELECT
name, coordinate from location where location_id=${item.location_id}
 field column=name name=location_name /
 field column=coordinate 
name=location_coordinates /
   /entity

   entity dataSource=app-ds name=source
query=SELECT name from source where source_id=${item.source_id}
 field column=name name=source_name /
   /entity

   entity dataSource=app-ds name=company
query=SELECT name from company where company_id=${item.company_id}
 field column=name name=company_name /
 /entity
/entity
/document
/dataConfig






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SQL DIH - Can I have some guidance please?

2012-12-04 Thread Spadez
Thank you so much for the help, I really appreciate it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to change Solr UI

2012-12-04 Thread Erik Hatcher

On Dec 4, 2012, at 08:21 , Jack Krupansky wrote:

 let's also be clear always that Solr is meant to be behind the firewall
 
 Absolutely, but we are NOT doing that when we provide the Velocity-based 
 /browse UI.
 Erik, your email example sounds reasonable, so if you want to substitute 
 something like that for the /browse handler, fine. As you point out, it is 
 not Velocity per se, but the /browse UI that results in a lack of clarity 
 about Solr being meant to be behind the firewall.

Point taken about being clear about this.  But I disagree about removing 
/browse.  It's useful, even if misunderstood/abused by some.  If there are 
spots where we need to be clearer about what it is that is being rendered, how 
it's rendered, and the pros/cons to it, then let's see about getting it 
mentioned more clearly.

But do keep in mind that something like this example: having Solr return 
suggestion lists as plain text suitable for suggest interfaces rather than 
having it return JSON or XML and having a middle tier process it when all you 
need is a plain list or some CSV.  It's quite fine and sensible to use 
wt=velocity behind the firewall too, even /browse as-is.  Same as with the 
XSL transformation writing capability.

Erik

Re: How to change Solr UI

2012-12-04 Thread Upayavira
I have been mulling on this. The browse UI is getting a little out of
date, and has interesting 'features' such as only showing a map for a
document if the document has a 'name' field, which makes no real sense
at all.

Apart from renovating the UI of browse, or possibly replacing it with
something more modern based upon the new admin UI technology, it would
make sense to add a 'disclaimer' somewhere prominent on the browse
interface - title it 'Solr Demo' or 'Solr Prototype', and add a link to
a wiki page that explains *why* you shouldn't use this in production.
Apart from the security issues already mentioned there's the MVC side -
you have a model and a view, but no controller, thus it becomes hard to
build anything useful very quickly.

I'd happily hack disclaimers into place if considered useful.

Upayavira

On Tue, Dec 4, 2012, at 01:21 PM, Jack Krupansky wrote:
 let's also be clear always that Solr is meant to be behind the firewall
 
 Absolutely, but we are NOT doing that when we provide the Velocity-based 
 /browse UI.
 
 Erik, your email example sounds reasonable, so if you want to substitute 
 something like that for the /browse handler, fine. As you point out, it
 is 
 not Velocity per se, but the /browse UI that results in a lack of clarity 
 about Solr being meant to be behind the firewall.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Erik Hatcher
 Sent: Tuesday, December 04, 2012 5:23 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to change Solr UI
 
 It's a shame wt=velocity gets a bad rap because /update isn't out of the
 box 
 strict with the HTTP/RESTful scene.  A delete should be a DELETE of some 
 sort.
 
 There are 3rd party standalone apps.  There was even a standalone ruby
 app 
 (flare) that was once upon a time in Solr's svn, but really the Solr 
 committers can't be expected to maintain all those various examples and
 keep 
 them up to date and working, so best to keep them 3rd party IMO.  We've
 got 
 Blacklight, VuFind, and all sorts of other front-ends out there with
 their 
 own vibrant communities.
 
 I'm -1 for removing VW (it's contrib plugin as it is already, just like 
 /update/extract).  /browse certainly could use a cleaning up / revamping, 
 but it's good stuff if I do say so myself and very handy to have
 available 
 for several reasons*.
 
 Let's try not to conflate wt=velocity with /update being more easily 
 dangerous than it probably should be.  But let's also be clear always
 that 
 Solr is meant to be behind the firewall as it's primary and default place
 in 
 the world.
 
 Erik
 
 * One I'll share: There is a real-world use case of a (relatively big) 
 company using wt=velocity to generate e-mail (for saved searches) texts
 very 
 conveniently in a backend environment and very high speed, no other 
 technologies/complexities needed in the mix but Solr and a little custom 
 templating.
 
 On Dec 3, 2012, at 20:58 , Jack Krupansky wrote:
 
  It is annoying to have to repeat these explanations so much.
 
  Any serious objection to removing the VW UI from Solr proper and replacing 
  it with a standalone app?
 
  I mean, Solr should have PHP, python, Java, and ruby example apps, right?
 
  -- Jack Krupansky
 
  -Original Message- From: Iwan Hanjoyo
  Sent: Monday, December 03, 2012 8:28 PM
  To: solr-user@lucene.apache.org
  Subject: Re: How to change Solr UI
 
 
 
  Note that Velocity _can_ be used for user-facing code, but be very sure 
  you
  secure your Solr. If you allow direct access, a user can easily enter
  something like http://
  solr/update?commit=truestream.body=deletequery*:*/query/delete.
  And all your documents will be gone.
 
  Hi Erickson,
 
  Thank you for the input.
  I'll notice and filter out this url.
  * http://
  solr/update?commit=truestream.body=deletequery*:*/query/delete
 
  Kind regards,
 
  Hanjoyo 
 


Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Upayavira
One small question - did you re-index in-between? The index structure
will be different for each.

Upayavira

On Tue, Dec 4, 2012, at 02:30 PM, Aaron Daubman wrote:
 Greetings,
 
 I'm finally updating an old instance and in testing, discovered that
 using
 the recommended TrieField instead of SortableIntField for range queries
 returns unexpected and seemingly incorrect results.
 
 A query with:
 
 q=*:*fq=+i_yearStartSort:{* TO 1995}fq=+i_yearStopSort:{* TO *}
 
 Should, and does under 1.4.1 with SortableIntField, only return docs that
 have some i_yearStopSort value and have an i_yearStartSort value less
 than
 1995.
 
 Unfortunately, under 3.6.1 with class=solr.TrieField type=integer,
 this
 query is returning docs that have neither an i_yearStopSort nor a
 i_yearStartSort value.
 
 
 Here are the two schemas:
 
 Solr 1.4.1 Relevant Schema Parts - Working as desired:
 -
 fieldType name=sint class=solr.SortableIntField
 sortMissingLast=true
 omitNorms=true/
 ...
 field name=i_yearStartSort type=sint indexed=true stored=false
 required=false multiValued=true/
 field name=i_yearStopSort type=sint indexed=true  stored=false
 required=false multiValued=true/
 
 
 Solr 3.6.1 Relevant Schema Parts - Not working as expected:
 -
 fieldType name=tint class=solr.TrieField type=integer
 precisionStep=4 sortMissingLast=true positionIncrementGap=0
 omitNorms=true/
 ...
 field name=i_yearStartSort type=tint indexed=true stored=false
 required=false multiValued=false/
 field name=i_yearStopSort type=tint indexed=true stored=false
 required=false multiValued=false/
 
 
 1) What is the best way to return to the desired/expected behavior?
 2) Can you explain to me why this happens?
 3) I have a sneaking suspicion (but could be totally wrong) that this
 relates to sortMissingLast=true - if it does, can you explain the
 seeming
 discrepancies in:
 SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
 this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
 version of 3.5 listed, but some of the comments also seem to indicate
 this
 was not actually fixed in 3.5+
 
 Thanks,
  Aaron


Re: How to change Solr UI

2012-12-04 Thread Jack Krupansky
Or, maybe integrate /browse with the Solr Admin UI and give it a graphic 
treatment that screams that it is a development tool and not designed to be 
a model for an app UI.


And, I still think it would be good to include SOME example of a prototype 
app UI with Solr, to drive home the point of here is [an example of] how 
you need to separate UI from Solr.


-- Jack Krupansky

-Original Message- 
From: Erik Hatcher

Sent: Tuesday, December 04, 2012 9:29 AM
To: solr-user@lucene.apache.org
Subject: Re: How to change Solr UI


On Dec 4, 2012, at 08:21 , Jack Krupansky wrote:


let's also be clear always that Solr is meant to be behind the firewall

Absolutely, but we are NOT doing that when we provide the Velocity-based 
/browse UI.
Erik, your email example sounds reasonable, so if you want to substitute 
something like that for the /browse handler, fine. As you point out, it is 
not Velocity per se, but the /browse UI that results in a lack of clarity 
about Solr being meant to be behind the firewall.


Point taken about being clear about this.  But I disagree about removing 
/browse.  It's useful, even if misunderstood/abused by some.  If there are 
spots where we need to be clearer about what it is that is being rendered, 
how it's rendered, and the pros/cons to it, then let's see about getting it 
mentioned more clearly.


But do keep in mind that something like this example: having Solr return 
suggestion lists as plain text suitable for suggest interfaces rather than 
having it return JSON or XML and having a middle tier process it when all 
you need is a plain list or some CSV.  It's quite fine and sensible to use 
wt=velocity behind the firewall too, even /browse as-is.  Same as with the 
XSL transformation writing capability.


Erik= 



Re: Problems while Searching Plural Form of Verb

2012-12-04 Thread Jack Krupansky
Use a stemmer, such as the English Plural-only 
stemmer,EnglishMinimalStemFilterFactory.


See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html
and
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html

-- Jack Krupansky

-Original Message- 
From: Pratyul Kapoor

Sent: Tuesday, December 04, 2012 9:25 AM
To: solr-user@lucene.apache.org
Subject: Problems while Searching Plural Form of Verb

Hello,

I am facing a problem with plural form of words.
My Index has words in its singular form. And If I search for plural form of
word, I get zero results.

Is there any way in solr by which I can fix this.

Pratyul 



[Solrj 4.0] How use JOIN

2012-12-04 Thread Roman Slavík

Hi,

I can't found any good example how use Join function with Solrj 4.0 api.

Let's have this example data:

doc
field name=id1/field
field name=nameThomas/field
field name=age40/field
/doc
doc
field name=id2/field
field name=nameJohn/field
field name=age17/field
field name=parent1/field
  /doc

And code:

  String stringQuery = (name:Thomas) AND (age:40);
  SolrQuery query = new SolrQuery();
  query.setQuery(stringQuery);
  QueryResponse response = solrServer.query(query);

Result is doc with id=1. Now I want extend query above with JOIN similar 
to this:


  {!join from=to=idparent }(name:John AND age:17)

and expect same result - doc with id=1.  But I have no idea how add this 
Join condition to SolrQuery object. Is there any method I missed? Or 
must I extend my base stringQuery somehow?


Thanks for any advice!

Roman




Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Aaron Daubman
Hi Upayavira,

One small question - did you re-index in-between? The index structure
 will be different for each.


Yes, the Solr 1.4.1 (working) instance was built using the original schema
and that solr version.
The Solr 3.6.1 (not working) instance was re-built using the new schema and
Solr 3.6.1...

Thanks,
  Aaron


Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Aaron Daubman
I forgot a possibly important piece... Given the different Solr versions,
the schema version (and it's related different defaults) is also a change:

Solr 1.4.1 Has:
schema name=ourSchema version=1.1

Solr 3.6.1 Has:
schema name=ourSchema version=1.5


 Solr 1.4.1 Relevant Schema Parts - Working as desired:

 
 -
  fieldType name=sint class=solr.SortableIntField
  sortMissingLast=true
  omitNorms=true/
  ...
  field name=i_yearStartSort type=sint indexed=true stored=false
  required=false multiValued=true/
  field name=i_yearStopSort type=sint indexed=true  stored=false
  required=false multiValued=true/
 
 
  Solr 3.6.1 Relevant Schema Parts - Not working as expected:
 
 -
  fieldType name=tint class=solr.TrieField type=integer
  precisionStep=4 sortMissingLast=true positionIncrementGap=0
  omitNorms=true/
  ...
  field name=i_yearStartSort type=tint indexed=true stored=false
  required=false multiValued=false/
  field name=i_yearStopSort type=tint indexed=true stored=false
  required=false multiValued=false/



Re: Cannot run Solr4 from Intellij Idea

2012-12-04 Thread Aaron Daubman
Interestingly, I have run in to this same (or very similar) issue when
attempting to run embedded solr. All of the solr.* classes that were
recently moved to lucene would not work with the solr.* shorthand - I had
to replace them with the full classpath. As you found, these shorthands in
the same schema worked fine from within solr proper (webapp).

Is there a workaround for this? (It would be great to have a unified schema
between embedded and webapp solr instances)

Thanks,
 Aaron


On Tue, Dec 4, 2012 at 7:37 AM, Artyom ice...@mail.ru wrote:

 After 2 days I have figured out how to open Solr 4 in IntelliJ IDEA 11.1.4
 on
 Tomcat 7. IntelliJ IDEA finds webapp/web/WEB-INF/web.xml and offers to make
 a facet from it and adds this facet to the parent module, from which an
 artifact can be created.

 The problem is that Solr cannot run properly. I get this message:

 SEVERE: Unable to create core: mycore
 org.apache.solr.common.SolrException: Plugin init failure for [schema.xml]
 fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer:
 Error loading class 'solr.StandardTokenizerFactory'
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
 at
 org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369)
 at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113)
 at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:846)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
 at

 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
 at

 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
 at

 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
 at

 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
 at

 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
 at

 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
 at

 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650)
 at

 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306)
 at
 org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
 at

 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
 at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
 at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
 at

 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
 at

 org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.solr.common.SolrException: Plugin init failure for
 [schema.xml] analyzer/tokenizer: Error loading class
 'solr.StandardTokenizerFactory'
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
 at

 org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:344)
 at

 org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
 at

 org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
 ... 25 more
 Caused by: org.apache.solr.common.SolrException: Error loading class
 'solr.StandardTokenizerFactory'
 at

 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436)
 at

 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457)
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89)
 at

 org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
 ... 29 more
 Caused by: java.lang.ClassNotFoundException: solr.StandardTokenizerFactory
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at
 

Re: Luke and SOLR search giving different results

2012-12-04 Thread Erol Akarsu
Thanks Shawn and Jack,

I changed solrconfig to set defaul query field (qf) to field content. It
works fine now.

Erol Akarsu

On Mon, Dec 3, 2012 at 5:03 PM, Shawn Heisey s...@elyograg.org wrote:

 On 12/3/2012 1:44 PM, Erol Akarsu wrote:

 I tried  as search query  not baş but features:baş in field q in
 SOLR
 GUI. And, I got result!

 In the one document, I had some fields type of text_eng, text_general and
 one field features type of text_tr. If I don't specify field name, SOLR
 use
 EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
 in search query string.


 Your config is set up to search against a field named text by default -
 either by a setting in schema.xml or a df parameter in your search
 handler definition in solrconfig.xml.  If you are using (e)dismax, it might
 be qf/pf parameters instead of df.

 The field named text is not properly set up for this search.  Your
 attachment at the beginning of this thread indicates that either you do not
 have a text field for this document at all, or that field is not stored.
  If the text field is a copyField as Jack has mentioned, note that it
 doesn't matter what analysis you are doing on features -- the copy is done
 before analysis, so it is completely separate.

 Thanks,
 Shawn




Re: Backing up SolR 4.0

2012-12-04 Thread Shawn Heisey

On 12/4/2012 1:55 AM, Andy D'Arcy Jewell wrote:
Is there an easy way to tell (say from a shell script) when all 
commits and merges [are] complete?


One important bit of information I just thought of: A default Solr 4 
config uses a new directory implementation called NRTCachingDirectory, 
which in some circumstances may keep part of the newest segment(s) in 
RAM.  I *hope* that issuing an explicit hard commit will flush that to 
disk, but I am not sure.  You *might* need to switch to the old 
directory implementation to be sure a hardlink backup is complete.  Can 
one of the committers please comment on this?  Assuming that we work out 
this detail, the rest of what I've said will be valid.


Detecting when commits are done has to be coordinated with your indexing 
program, so depending on how your system works, kicking off the make 
hardlinks process might need to be part of your indexing program.  As 
for merges, that's a bit tougher, because Solr 4 and later will do 
merges in the background after informing your indexing program that the 
commit is complete.


If you grab a hardlink copy while a merge is happening, I do not believe 
it will be corrupt in any way, but it may be larger than expected 
because it will contain the new segments from the merge. Those segments 
would not be referenced by the segments.nnn file, so I *think* that if 
you then load that index into Solr, it would ignore the other segments.  
I am not sure about that, though.  You might be able to use a command 
like the following to delete the newer segments from the copy, but I 
would not do it without experimentation to be sure it's actually 
required, and that it never wipes anything out that you actually need:


find -type f -newer segments.gen | xargs rm -f

If I keep a replica solely for backup purposes, I assume I can do 
what I like with it - presumably replication will resume/catch-up 
when I resume it (I admit, I have a bit of reading to do wrt 
replication - I just skimmed that because it wasn't in my initial brief).


As long as the replica server isn't being actively updated or used for 
queries and you temporarily turn off replication, you should be able to 
do whatever you want with its index.


I'm assuming that because you're using hardlinks, that means that SolR 
writes a new file when it updates (sortof copy-on-write style)? So 
we are relying on the principle that as long as you have at least one 
remaining reference to the data, it's not deleted...


Yes. Lucene (which Solr uses under the hood) never touches segment files 
once they have been written. It only deletes segment files in two 
circumstances: 1) Every document in that segment has been deleted from 
the index.  2) The data in that segment has been written to a new 
segment.  The combination of Lucene's update method and hardlink 
functionality will ensure that the hardlink copy is always good.


Thanks,
Shawn



Re: How to change Solr UI

2012-12-04 Thread Upayavira
That's an interesting take. 

I agree that Solr needs *something* for folks to use. It is unfortunate
that Solr actually has a functioning HTTP infrastructure, because it
then makes less sense to build an alternative one up. E.g. How about:

http://localhost:8983/solr  - admin UI
http://localhost:8983/browse - separate browse webapp

It would be a separate app that runs as another wepapp, accessing Solr
via HTTP just as any other app would.

It could still use Velocity, but would demonstrate that you shouldn't
integrate your app with Solr. A minimal dependency app for demonstration
purposes only.

Upayavira

On Tue, Dec 4, 2012, at 02:37 PM, Jack Krupansky wrote:
 Or, maybe integrate /browse with the Solr Admin UI and give it a graphic 
 treatment that screams that it is a development tool and not designed to
 be 
 a model for an app UI.
 
 And, I still think it would be good to include SOME example of a
 prototype 
 app UI with Solr, to drive home the point of here is [an example of] how 
 you need to separate UI from Solr.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Erik Hatcher
 Sent: Tuesday, December 04, 2012 9:29 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to change Solr UI
 
 
 On Dec 4, 2012, at 08:21 , Jack Krupansky wrote:
 
  let's also be clear always that Solr is meant to be behind the firewall
 
  Absolutely, but we are NOT doing that when we provide the Velocity-based 
  /browse UI.
  Erik, your email example sounds reasonable, so if you want to substitute 
  something like that for the /browse handler, fine. As you point out, it is 
  not Velocity per se, but the /browse UI that results in a lack of clarity 
  about Solr being meant to be behind the firewall.
 
 Point taken about being clear about this.  But I disagree about removing 
 /browse.  It's useful, even if misunderstood/abused by some.  If there
 are 
 spots where we need to be clearer about what it is that is being
 rendered, 
 how it's rendered, and the pros/cons to it, then let's see about getting
 it 
 mentioned more clearly.
 
 But do keep in mind that something like this example: having Solr return 
 suggestion lists as plain text suitable for suggest interfaces rather
 than 
 having it return JSON or XML and having a middle tier process it when all 
 you need is a plain list or some CSV.  It's quite fine and sensible to
 use 
 wt=velocity behind the firewall too, even /browse as-is.  Same as with
 the 
 XSL transformation writing capability.
 
 Erik= 
 


Re: How to change Solr UI

2012-12-04 Thread Erik Hatcher
And basically that's what i had in mind with Prism here: 
https://github.com/lucidimagination/Prism
Prism's very lightweight, uses Velocity (or not, any Ruby templating technology 
available), and is entirely separate from Solr.  Before that there was Flare: 
https://github.com/erikhatcher/solr-ruby-flare/tree/master/flare.Prism is 
the approach I'd (obviously) take these days, and it's getting some more 
attention, it looks like, soon.

Blacklight and VuFind are much more richly capable.

So there's options already out there, and surely many others that I don't even 
mention.  A new top-level wiki page seems warranted from this discussion from 
http://wiki.apache.org/solr/FrontPage to list off all the various front-ends 
available.

Erik



On Dec 4, 2012, at 12:11 , Upayavira wrote:

 That's an interesting take. 
 
 I agree that Solr needs *something* for folks to use. It is unfortunate
 that Solr actually has a functioning HTTP infrastructure, because it
 then makes less sense to build an alternative one up. E.g. How about:
 
 http://localhost:8983/solr  - admin UI
 http://localhost:8983/browse - separate browse webapp
 
 It would be a separate app that runs as another wepapp, accessing Solr
 via HTTP just as any other app would.
 
 It could still use Velocity, but would demonstrate that you shouldn't
 integrate your app with Solr. A minimal dependency app for demonstration
 purposes only.
 
 Upayavira
 
 On Tue, Dec 4, 2012, at 02:37 PM, Jack Krupansky wrote:
 Or, maybe integrate /browse with the Solr Admin UI and give it a graphic 
 treatment that screams that it is a development tool and not designed to
 be 
 a model for an app UI.
 
 And, I still think it would be good to include SOME example of a
 prototype 
 app UI with Solr, to drive home the point of here is [an example of] how 
 you need to separate UI from Solr.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Erik Hatcher
 Sent: Tuesday, December 04, 2012 9:29 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to change Solr UI
 
 
 On Dec 4, 2012, at 08:21 , Jack Krupansky wrote:
 
 let's also be clear always that Solr is meant to be behind the firewall
 
 Absolutely, but we are NOT doing that when we provide the Velocity-based 
 /browse UI.
 Erik, your email example sounds reasonable, so if you want to substitute 
 something like that for the /browse handler, fine. As you point out, it is 
 not Velocity per se, but the /browse UI that results in a lack of clarity 
 about Solr being meant to be behind the firewall.
 
 Point taken about being clear about this.  But I disagree about removing 
 /browse.  It's useful, even if misunderstood/abused by some.  If there
 are 
 spots where we need to be clearer about what it is that is being
 rendered, 
 how it's rendered, and the pros/cons to it, then let's see about getting
 it 
 mentioned more clearly.
 
 But do keep in mind that something like this example: having Solr return 
 suggestion lists as plain text suitable for suggest interfaces rather
 than 
 having it return JSON or XML and having a middle tier process it when all 
 you need is a plain list or some CSV.  It's quite fine and sensible to
 use 
 wt=velocity behind the firewall too, even /browse as-is.  Same as with
 the 
 XSL transformation writing capability.
 
 Erik= 
 



Re: SolrCloud : impossible to create a new collection

2012-12-04 Thread Mark Miller

On Dec 4, 2012, at 5:57 AM, LEFEBVRE Guillaume guillaume.lefeb...@cegedim.fr 
wrote:

 Hello,
 
 I have a SolrCloud environment with 2 collections running perfectly.
 
 I would like to create a new collection using :  
 http://localhost:8080/solr/admin/collections?action=CREATEname=mycollectionnumShards=1numReplicas=2
 But nothing appends !
 
 Could you help me please ?
 
 Best regards,
 Guillaume

Check the logs to see the result - probably the logs of the first instance you 
started (the overseer). It should indicate why the collection was not created.

Before long we will have the call itself return the results so you don't have 
to dig into the logs for this.

- Mark

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Chris Hostetter

: q=*:*fq=+i_yearStartSort:{* TO 1995}fq=+i_yearStopSort:{* TO *}
...
: Unfortunately, under 3.6.1 with class=solr.TrieField type=integer, this
: query is returning docs that have neither an i_yearStopSort nor a
: i_yearStartSort value.

H... I can't seem to reproduce this.

Here's what i tried...

1) start up the Solr 3.6.1 example

2) index the 3.6.1 example docs...
java -jar post.jar *.xml

3) index a single doc using some *_ti dynamic fields (which us 
tint)...
java -Ddata=args -jar post.jar 'adddocfield name=idHOSS/fieldfield 
name=start_ti45/fieldfield name=end_ti100/field/doc/add'

If i do some open ended range queries on the *_ti fields, i get the 
results i expect (either only my HOSS doc if it's in the ranges, or no 
docs if HOSS is out of range)...

Matches HOSS...
http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%2050}fl=start_ti,id,end_ti
http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%2050}fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti

Matches nothing...
http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%205}fl=start_ti,id,end_ti
http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%205}fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti

I repeated the test after deleting all data, and adding 
sortMissingLast=true to the example tint fieldType, and got the same 
results.

: Solr 3.6.1 Relevant Schema Parts - Not working as expected:
: 
-
: fieldType name=tint class=solr.TrieField type=integer
: precisionStep=4 sortMissingLast=true positionIncrementGap=0
: omitNorms=true/

FYI: you have some wackiness there: 'type=integer' inside the 
'fieldType name=tint .../' ... that shouldn't have caused any problems 
though, but it doesn't make any sense. 

: field name=i_yearStartSort type=tint indexed=true stored=false
: required=false multiValued=false/
: field name=i_yearStopSort type=tint indexed=true stored=false
: required=false multiValued=false/

can you try changing those to stored=true and re-indexing as a sanity 
check? perhaps your indexing code is putting a default value in that 
you aren't realizing?

w/o more specifics (ie: sample docs to index) on how to reproduce, i can't 
seem to find any problem.


-Hoss


Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

2012-12-04 Thread Jack Krupansky
Could you show us some input data, both WITH a i_yearStopSort value and 
WITHOUT the the value?


I tried a quick test using the stock Solr 3.6.1 example schema and a dynamic 
integer field and the filter query did in fact filter out all documents that 
did not have a value in that field:


http://localhost:8983/solr/select?q=*:*fq=%2bx_i:{*+TO+*}

Maybe you could come up with a simple sample solrxml document that can be 
added to the stock 3.6.1 example schema that shows the problem.


-- Jack Krupansky

-Original Message- 
From: Aaron Daubman

Sent: Tuesday, December 04, 2012 9:30 AM
To: solr-user@lucene.apache.org
Subject: Range Queries performing differently on SortableIntField vs 
TrieField of type integer


Greetings,

I'm finally updating an old instance and in testing, discovered that using
the recommended TrieField instead of SortableIntField for range queries
returns unexpected and seemingly incorrect results.

A query with:

q=*:*fq=+i_yearStartSort:{* TO 1995}fq=+i_yearStopSort:{* TO *}

Should, and does under 1.4.1 with SortableIntField, only return docs that
have some i_yearStopSort value and have an i_yearStartSort value less than
1995.

Unfortunately, under 3.6.1 with class=solr.TrieField type=integer, this
query is returning docs that have neither an i_yearStopSort nor a
i_yearStartSort value.


Here are the two schemas:

Solr 1.4.1 Relevant Schema Parts - Working as desired:
-
fieldType name=sint class=solr.SortableIntField sortMissingLast=true
omitNorms=true/
...
field name=i_yearStartSort type=sint indexed=true stored=false
required=false multiValued=true/
field name=i_yearStopSort type=sint indexed=true  stored=false
required=false multiValued=true/


Solr 3.6.1 Relevant Schema Parts - Not working as expected:
-
fieldType name=tint class=solr.TrieField type=integer
precisionStep=4 sortMissingLast=true positionIncrementGap=0
omitNorms=true/
...
field name=i_yearStartSort type=tint indexed=true stored=false
required=false multiValued=false/
field name=i_yearStopSort type=tint indexed=true stored=false
required=false multiValued=false/


1) What is the best way to return to the desired/expected behavior?
2) Can you explain to me why this happens?
3) I have a sneaking suspicion (but could be totally wrong) that this
relates to sortMissingLast=true - if it does, can you explain the seeming
discrepancies in:
SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
version of 3.5 listed, but some of the comments also seem to indicate this
was not actually fixed in 3.5+

Thanks,
Aaron 



SolrCell takes InputStream

2012-12-04 Thread Bing Hua
Hi,

While using ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);

The two ways of adding a file are
up.addFile(File)
up.addContentStream(ContentStream)

However my raw files are stored on some remote storage devices. I am able to
get an InputStream object for the file to be indexed. To me it may seem
awkward to have the file temporarily stored locally. Is there a way of
directly passing the InputStream in (e.g. constructing ContentStream using
the InputStream)?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCell-takes-InputStream-tp4024315.html
Sent from the Solr - User mailing list archive at Nabble.com.


Creating a collection without bootstrap

2012-12-04 Thread Walter Underwood
I seem to be missing a step or some kind of ordering in creating a new 
collection without using bootstrap upload. I have these steps:

* zookeeper upconfig (pretty sure this is first)
* Collection API create collection
* zookeeper linkconfig

I'm working from this page: http://wiki.apache.org/solr/SolrCloud

A step-by-step recipe would be really nice.

wunder
--
Walter Underwood
wun...@wunderwood.org
Search Guy, Chegg.com





Re: Loading DictionaryCompoundWordTokenFilterFactory as shared object across all cores

2012-12-04 Thread Chris Hostetter
: Do we have any ways where we can load
: DictionaryCompoundWordTokenFilterFactory only once and shared across all
: the cores?.

I don't think so, but there are tricks you can use in a custom plugin 
variant depending on your use cases, as well as a really easy solution if 
the schema's for all of your collections are identical...

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3Calpine.DEB.2.02.1210161830351.31983@frisbee%3E

-Hoss


Re: Sorting by multi-valued field

2012-12-04 Thread Chris Hostetter

: perfectly, but users expect the result set to be ordered by the next start
: time.
...
: Is there a more elegant way to do this in Solr? A function query or
: subquery maybe? I thought about it for quite a while and couldn't come up
: with a viable solution.

I think you could concievably write a custom function that built an 
UnInvertedField over your multivalued field, and then returned the lowest 
value for each doc where the value is after 'NOW' but there is nothing 
out of the box that will do this for you (and i haven't really thought 
hard about how viable this approach is ... i can't think of any obvious 
problems off the top of my head)

-Hoss


Re: Sorting by multi-valued field

2012-12-04 Thread Upayavira
But it would be a lot harder than either splitting them out into
separate docs, or writing code to re-index docs when one of their
'next-event' dates passes, with a new single valued 'next-event' field.
Less efficient, but easier to write/manage.

Upayavira

On Tue, Dec 4, 2012, at 07:35 PM, Chris Hostetter wrote:
 
 : perfectly, but users expect the result set to be ordered by the next
 start
 : time.
   ...
 : Is there a more elegant way to do this in Solr? A function query or
 : subquery maybe? I thought about it for quite a while and couldn't come
 up
 : with a viable solution.
 
 I think you could concievably write a custom function that built an 
 UnInvertedField over your multivalued field, and then returned the
 lowest 
 value for each doc where the value is after 'NOW' but there is nothing 
 out of the box that will do this for you (and i haven't really thought 
 hard about how viable this approach is ... i can't think of any obvious 
 problems off the top of my head)
 
 -Hoss


Re: Sorting by multi-valued field

2012-12-04 Thread Chris Hostetter

: But it would be a lot harder than either splitting them out into
: separate docs, or writing code to re-index docs when one of their
: 'next-event' dates passes, with a new single valued 'next-event' field.
: Less efficient, but easier to write/manage.

Don't get me wrong -- if you can determine at index time which single 
value you wnat to use to sort on then by all means that is going to be the 
best approach -- it's precisely the reason why 
FirstFieldValueUpdateProcessorFactory, 
LastFieldValueUpdateProcessorFactory, MaxFieldValueUpdateProcessorFactory, 
and MinFieldValueUpdateProcessorFactory.

But my interpretation of the next start time is that it wsa dependent on 
the value of NOW when the query was executed (ie: some of the indexed 
values may be in the past) in which case that approach wouldn't work.

: On Tue, Dec 4, 2012, at 07:35 PM, Chris Hostetter wrote:
:  
:  : perfectly, but users expect the result set to be ordered by the next
:  start
:  : time.
:  ...
:  : Is there a more elegant way to do this in Solr? A function query or
:  : subquery maybe? I thought about it for quite a while and couldn't come
:  up
:  : with a viable solution.
:  
:  I think you could concievably write a custom function that built an 
:  UnInvertedField over your multivalued field, and then returned the
:  lowest 
:  value for each doc where the value is after 'NOW' but there is nothing 
:  out of the box that will do this for you (and i haven't really thought 
:  hard about how viable this approach is ... i can't think of any obvious 
:  problems off the top of my head)
:  
:  -Hoss
: 

-Hoss


Re: How to change Solr UI

2012-12-04 Thread Upayavira
But there's value in having something packaged within Solr itself, for
demo purposes.

That would I suspect make it Java (like it or not!) And that would
probably not make it very state-of-the art, unless it used jquery, with
a very lightweight java portion, which would be possible.

Upayavira

On Tue, Dec 4, 2012, at 05:42 PM, Erik Hatcher wrote:
 And basically that's what i had in mind with Prism here:
 https://github.com/lucidimagination/Prism
 Prism's very lightweight, uses Velocity (or not, any Ruby templating
 technology available), and is entirely separate from Solr.  Before that
 there was Flare:
 https://github.com/erikhatcher/solr-ruby-flare/tree/master/flare.   
 Prism is the approach I'd (obviously) take these days, and it's getting
 some more attention, it looks like, soon.
 
 Blacklight and VuFind are much more richly capable.
 
 So there's options already out there, and surely many others that I don't
 even mention.  A new top-level wiki page seems warranted from this
 discussion from http://wiki.apache.org/solr/FrontPage to list off all
 the various front-ends available.
 
   Erik
 
 
 
 On Dec 4, 2012, at 12:11 , Upayavira wrote:
 
  That's an interesting take. 
  
  I agree that Solr needs *something* for folks to use. It is unfortunate
  that Solr actually has a functioning HTTP infrastructure, because it
  then makes less sense to build an alternative one up. E.g. How about:
  
  http://localhost:8983/solr  - admin UI
  http://localhost:8983/browse - separate browse webapp
  
  It would be a separate app that runs as another wepapp, accessing Solr
  via HTTP just as any other app would.
  
  It could still use Velocity, but would demonstrate that you shouldn't
  integrate your app with Solr. A minimal dependency app for demonstration
  purposes only.
  
  Upayavira
  
  On Tue, Dec 4, 2012, at 02:37 PM, Jack Krupansky wrote:
  Or, maybe integrate /browse with the Solr Admin UI and give it a graphic 
  treatment that screams that it is a development tool and not designed to
  be 
  a model for an app UI.
  
  And, I still think it would be good to include SOME example of a
  prototype 
  app UI with Solr, to drive home the point of here is [an example of] how 
  you need to separate UI from Solr.
  
  -- Jack Krupansky
  
  -Original Message- 
  From: Erik Hatcher
  Sent: Tuesday, December 04, 2012 9:29 AM
  To: solr-user@lucene.apache.org
  Subject: Re: How to change Solr UI
  
  
  On Dec 4, 2012, at 08:21 , Jack Krupansky wrote:
  
  let's also be clear always that Solr is meant to be behind the firewall
  
  Absolutely, but we are NOT doing that when we provide the Velocity-based 
  /browse UI.
  Erik, your email example sounds reasonable, so if you want to substitute 
  something like that for the /browse handler, fine. As you point out, it 
  is 
  not Velocity per se, but the /browse UI that results in a lack of clarity 
  about Solr being meant to be behind the firewall.
  
  Point taken about being clear about this.  But I disagree about removing 
  /browse.  It's useful, even if misunderstood/abused by some.  If there
  are 
  spots where we need to be clearer about what it is that is being
  rendered, 
  how it's rendered, and the pros/cons to it, then let's see about getting
  it 
  mentioned more clearly.
  
  But do keep in mind that something like this example: having Solr return 
  suggestion lists as plain text suitable for suggest interfaces rather
  than 
  having it return JSON or XML and having a middle tier process it when all 
  you need is a plain list or some CSV.  It's quite fine and sensible to
  use 
  wt=velocity behind the firewall too, even /browse as-is.  Same as with
  the 
  XSL transformation writing capability.
  
  Erik= 
  
 


Re: Creating a collection without bootstrap

2012-12-04 Thread Walter Underwood
Here is one problem. On the SolrCloud wiki page, it says link collection sets 
to collections, but I'm pretty sure that should read config set.

Also config set (or conf set) is never defined.

wunder

On Dec 4, 2012, at 11:07 AM, Walter Underwood wrote:

 I seem to be missing a step or some kind of ordering in creating a new 
 collection without using bootstrap upload. I have these steps:
 
 * zookeeper upconfig (pretty sure this is first)
 * Collection API create collection
 * zookeeper linkconfig
 
 I'm working from this page: http://wiki.apache.org/solr/SolrCloud
 
 A step-by-step recipe would be really nice.
 
 wunder
 --
 Walter Underwood
 wun...@wunderwood.org
 Search Guy, Chegg.com





Solr 4 : Optimize very slow

2012-12-04 Thread Sandeep Mestry
Hi All,

I have recently migrated from solr 1.4 to solr 4 and have done the basic
changes required for solr 4 in solrconfig.xml and schema.xml. I have also
rebuilt the index set for solr 4.
We run optimize every morning at 4 am and we keep the index updates off
during this process.
Previously, with 1.4 - the optimization used to take around 20-30 mins per
shard but now with solr 4, its taking 6-8 hours or even more..
I have also tested the optimize from solr UI and that takes 6-8 hours too..
The hardware is saeme and, we have deployed solr under WAS.
There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
are using master-slave configuration with rsync. I have not enabled soft
commit. Also, commiter process is scheduled to run every minute.

I am not sure which part I'm missing, do let me know your inputs please.

Many Thanks in advance,
Sandeep


Re: Replication error and Shard Inconsistencies..

2012-12-04 Thread Mark Miller
Hey Annette, 

Are you using Solr 4.0 final? A version of 4x or 5x?

Do you have the logs for when the replica tried to catch up to the leader?

Stopping and starting the node is actually a fine thing to do. Perhaps you can 
try it again and capture the logs.

If a node is not listed as live but is in the clusterstate, that is fine. It 
shouldn't be consulted. To remove it, you either have to unload it with the 
core admin api or you could manually delete it's registered state under the 
node states node that the Overseer looks at.

Also, it would be useful to see the logs of the new node coming up…there should 
be info about what happens when it tries to replicate.

It almost sounds like replication is just not working for your setup at all and 
that you have to tweak some configuration. You shouldn't see these nodes as 
active then though - so we should get to the bottom of this.

- Mark

On Dec 4, 2012, at 4:37 AM, Annette Newton annette.new...@servicetick.com 
wrote:

 Hi all,
  
 I have a quite weird issue with Solr cloud.  I have a 4 shard, 2 replica 
 setup, yesterday one of the nodes lost communication with the cloud setup, 
 which resulted in it trying to run replication, this failed, which has left 
 me with a Shard (Shard 4) that has one node with 2,833,940 documents on the 
 leader and 409,837 on the follower – obviously a big discrepancy and this 
 leads to queries returning differing results depending on which of these 
 nodes it gets the data from.  There is no indication of a problem on the 
 admin site other than the big discrepancy in the number of documents.  They 
 are all marked as active etc…
  
 So I thought that I would force replication to happen again, by stopping and 
 starting solr (probably the wrong thing to do) but this resulted in no 
 change.  So I turned off that node and replaced it with a new one.  In 
 zookeeper live nodes doesn’t list that machine but it is still being shown as 
 active on in the ClusterState.json, I have attached images showing this…  
 This means the new node hasn’t replaced the old node but is now a replica on 
 Shard 1!  Also that node doesn’t appear to have replicated Shard 1’s data 
 anyway, it didn’t get marked with replicating or anything… 
  
 How do I clear the zookeeper state without taking down the entire solr cloud 
 setup?  How do I force a node to replicate from the others in the shard?
  
 Thanks in advance.
  
 Annette Newton
  
  
 LiveNodes.zip



Re: SOLR4 cluster - strange CPU spike on slave

2012-12-04 Thread Mark Miller

On Dec 4, 2012, at 2:25 AM, John Nielsen j...@mcb.dk wrote:

 The post about mmapdirectory is really interesting. We switched to using
 that from NRTCachingDirectory and am monitoring performance as well.
 Initially performance doesn't look stellar, but i suspect that we lack
 memory in the server to really make it shine.

NRTCachingDirectory delegates to another directory and simply caches small 
segments in RAM - usually it delegates MMapDirectory by default. So likely you 
won't notice any changes, because you have not likely really changed anything. 
NRTCachingDirectory simply helps in the NRT case and doesn't really hurt that 
I've seen in the std case. It's more like a help dir than a replacement.

- Mark

Re: Solr 4 : Optimize very slow

2012-12-04 Thread Otis Gospodnetic
Hi,

You should search the ML archives for : optimize wunder Erick Otis :)

Is WAS really AWS? If so, if these are new EC2 instances you are
unfortunately unable to do a fair apples to apples comparison. Have you
tried a different set of instances?

Otis
--
Performance Monitoring - http://sematext.com/spm
On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote:

 Hi All,

 I have recently migrated from solr 1.4 to solr 4 and have done the basic
 changes required for solr 4 in solrconfig.xml and schema.xml. I have also
 rebuilt the index set for solr 4.
 We run optimize every morning at 4 am and we keep the index updates off
 during this process.
 Previously, with 1.4 - the optimization used to take around 20-30 mins per
 shard but now with solr 4, its taking 6-8 hours or even more..
 I have also tested the optimize from solr UI and that takes 6-8 hours too..
 The hardware is saeme and, we have deployed solr under WAS.
 There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
 are using master-slave configuration with rsync. I have not enabled soft
 commit. Also, commiter process is scheduled to run every minute.

 I am not sure which part I'm missing, do let me know your inputs please.

 Many Thanks in advance,
 Sandeep



Getting deleted documents during DIH full-import

2012-12-04 Thread Shawn Heisey
I am doing a DIH full import on a very recent checkout from branch_4x.  
Something I've recently done differently is enabling autocommit.  I am 
seeing that there are deleted documents in some of the indexes.  See 
Development Build Indexes at the bottom of the following screenshot.  
When the import is complete, the numbered shards will contain 13 million 
documents.


http://dl.dropbox.com/u/97770508/statuspage-deletes-import.png

The MySQL database that this imports from has a unique index on the 
field that Solr is using for its UniqueKey, soit's not possible to have 
duplicates.  Each import uses one SELECT statement for the entire 13 
million document import.  What might be leading to these deleted docs?


Thanks,
Shawn



Re: Getting deleted documents during DIH full-import

2012-12-04 Thread Shawn Heisey

On 12/4/2012 5:33 PM, Shawn Heisey wrote:
I am doing a DIH full import on a very recent checkout from 
branch_4x.  Something I've recently done differently is enabling 
autocommit.  I am seeing that there are deleted documents in some of 
the indexes.  See Development Build Indexes at the bottom of the 
following screenshot.  When the import is complete, the numbered 
shards will contain 13 million documents.


http://dl.dropbox.com/u/97770508/statuspage-deletes-import.png

The MySQL database that this imports from has a unique index on the 
field that Solr is using for its UniqueKey, soit's not possible to 
have duplicates.  Each import uses one SELECT statement for the entire 
13 million document import.  What might be leading to these deleted docs?


Interesting development:  The imports are now up to over 11 million 
documents, but now the number of deleted documents on all shards is zero.


I calculate deleted documents on my stats page by subtracting numDocs 
from maxDoc, information gathered from /admin/mbeans?stats=true.


Thanks,
Shawn



Re: How to SWAP cores (or collections) with SolrCloud (SOLR-3866)

2012-12-04 Thread Mark Miller

On Dec 4, 2012, at 4:57 AM, Andre Bois-Crettez andre.b...@kelkoo.com wrote:

 * what can we do to help progress on SOLR-3866 ? Maybe use case
 scenarios, detailing desired behavior ? Constrains on what cores or
 collections are allowed to SWAP, ie. same config, same doc-shard
 assignments ?


Yes please - if you could elaborate on that issue, I can help you try and get 
something in.

- Mark

SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)

2012-12-04 Thread Brett Hoerner
Hi,

I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection,
which I called default and haven't used since. I'm using an external ZK
ensemble that was completely empty before I started this cloud.

Once I had all 4 nodes in the cloud I used the collection API to create the
real collections I wanted. I also tested that deleting works.

For example,

# this worked
curl 
http://localhost:8984/solr/admin/collections?action=CREATEname=15678numShards=4


# this worked
curl http://localhost:8984/solr/admin/collections?action=DELETEname=15678;

Next, I started my indexer service which happily sent many, many updates to
the cloud. Queries against the collections also work just fine.

Finally, a few hours later, I tried doing a create and a delete. Both
operations did nothing, although Solr replied with a 200 OK.

$ curl -i 
http://localhost:8984/solr/admin/collections?action=CREATEname=15679numShards=4

HTTP/1.1 200 OK
Content-Type: application/xml; charset=UTF-8
Transfer-Encoding: chunked

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint
name=QTime3/int/lst

There is nothing in the stdout/stderr logs, nor the Java logs (I have it
set to WARN).

I have tried bouncing the nodes and it doesn't change anything.

Any ideas? How can I further debug this or what else can I provide?


RE: Solr 4 : Optimize very slow

2012-12-04 Thread Michael Ryan
When I upgraded from 3.2 to 3.6, I found that an optimize - all other variables 
being the same - took about twice as long. Eventually I was able to track this 
down to the new default of MMapDirectory. By changing back to NIOFSDirectory, I 
was able to get the optimize time back down to what it formerly was. I did this 
by adding this to solrconfig.xml:
directoryFactory name=DirectoryFactory class=solr.NIOFSDirectoryFactory/

I'd suggest trying that to see what effect it has (for us, NIOFSDirectory 
generally performs better across the board, but I've heard just the opposite 
from other people on this mailing list).

If that doesn't improve it, try looking at these things:
1) Is the size of the index files the same size as in 1.4? Perhaps something 
has changed to cause a significant size increase.
2) Is Solr 4 spending more time garbage collecting? Enable gc logging with 
-verbose:gc (or whatever the flag is), or use the jstat utility.
3) Watch the files in the index directory during the optimize and see if they 
are being written more slowly, or if the segment files are being copied 
around more often than before.

-Michael

-Original Message-
From: Sandeep Mestry [mailto:sanmes...@gmail.com] 
Sent: Tuesday, December 04, 2012 6:29 PM
To: solr-user@lucene.apache.org
Subject: Solr 4 : Optimize very slow

Hi All,

I have recently migrated from solr 1.4 to solr 4 and have done the basic
changes required for solr 4 in solrconfig.xml and schema.xml. I have also
rebuilt the index set for solr 4.
We run optimize every morning at 4 am and we keep the index updates off
during this process.
Previously, with 1.4 - the optimization used to take around 20-30 mins per
shard but now with solr 4, its taking 6-8 hours or even more..
I have also tested the optimize from solr UI and that takes 6-8 hours too..
The hardware is saeme and, we have deployed solr under WAS.
There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
are using master-slave configuration with rsync. I have not enabled soft
commit. Also, commiter process is scheduled to run every minute.

I am not sure which part I'm missing, do let me know your inputs please.

Many Thanks in advance,
Sandeep


Re: Loading DictionaryCompoundWordTokenFilterFactory as shared object across all cores

2012-12-04 Thread geetha anjali
We are using the same schema, we did try using shareSchema=true in
solr.xml, during indexing time, it works fine. It loads single time. But
during query time, it loads multiple multiple at core level.

On Wed, Dec 5, 2012 at 1:00 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:

 : Do we have any ways where we can load
 : DictionaryCompoundWordTokenFilterFactory only once and shared across all
 : the cores?.

 I don't think so, but there are tricks you can use in a custom plugin
 variant depending on your use cases, as well as a really easy solution if
 the schema's for all of your collections are identical...


 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3Calpine.DEB.2.02.1210161830351.31983@frisbee%3E

 -Hoss



Re: Solr 4 : Optimize very slow

2012-12-04 Thread Upayavira
I tried that search, without success :-(

I suspect what Otis was trying to say was to question why you are
optimising. Optimise was necessary under 1.4, but with newer Solr, the
new TieredMergePolicy does a much better job of handling background
merging, reducing the need for optimize. Try just not doing it at all
and see if your index actually reaches a point where it is needed.

Upayavira

On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote:
 Hi,
 
 You should search the ML archives for : optimize wunder Erick Otis :)
 
 Is WAS really AWS? If so, if these are new EC2 instances you are
 unfortunately unable to do a fair apples to apples comparison. Have you
 tried a different set of instances?
 
 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote:
 
  Hi All,
 
  I have recently migrated from solr 1.4 to solr 4 and have done the basic
  changes required for solr 4 in solrconfig.xml and schema.xml. I have also
  rebuilt the index set for solr 4.
  We run optimize every morning at 4 am and we keep the index updates off
  during this process.
  Previously, with 1.4 - the optimization used to take around 20-30 mins per
  shard but now with solr 4, its taking 6-8 hours or even more..
  I have also tested the optimize from solr UI and that takes 6-8 hours too..
  The hardware is saeme and, we have deployed solr under WAS.
  There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
  are using master-slave configuration with rsync. I have not enabled soft
  commit. Also, commiter process is scheduled to run every minute.
 
  I am not sure which part I'm missing, do let me know your inputs please.
 
  Many Thanks in advance,
  Sandeep
 


Maximum number of cores

2012-12-04 Thread S_Chawla
Hi,
I am using solr4.0, i have created 10 cores in solr. I want to know how many
maximum number of cores can be created in solr.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-number-of-cores-tp4024398.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4 : Optimize very slow

2012-12-04 Thread Walter Underwood
It was not necessary under 1.4. It has never been necessary.

It was not necessary in Ultraseek Server in 1996, using the same merging model.

In some cases, it can be a good idea. Since you are continuously updating, this 
is not one of those cases.

wunder

On Dec 4, 2012, at 9:29 PM, Upayavira wrote:

 I tried that search, without success :-(
 
 I suspect what Otis was trying to say was to question why you are
 optimising. Optimise was necessary under 1.4, but with newer Solr, the
 new TieredMergePolicy does a much better job of handling background
 merging, reducing the need for optimize. Try just not doing it at all
 and see if your index actually reaches a point where it is needed.
 
 Upayavira
 
 On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote:
 Hi,
 
 You should search the ML archives for : optimize wunder Erick Otis :)
 
 Is WAS really AWS? If so, if these are new EC2 instances you are
 unfortunately unable to do a fair apples to apples comparison. Have you
 tried a different set of instances?
 
 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote:
 
 Hi All,
 
 I have recently migrated from solr 1.4 to solr 4 and have done the basic
 changes required for solr 4 in solrconfig.xml and schema.xml. I have also
 rebuilt the index set for solr 4.
 We run optimize every morning at 4 am and we keep the index updates off
 during this process.
 Previously, with 1.4 - the optimization used to take around 20-30 mins per
 shard but now with solr 4, its taking 6-8 hours or even more..
 I have also tested the optimize from solr UI and that takes 6-8 hours too..
 The hardware is saeme and, we have deployed solr under WAS.
 There ar 4 shards and every shard contains around 8 - 9 Gig of data and we
 are using master-slave configuration with rsync. I have not enabled soft
 commit. Also, commiter process is scheduled to run every minute.
 
 I am not sure which part I'm missing, do let me know your inputs please.
 
 Many Thanks in advance,
 Sandeep
 

--
Walter Underwood
wun...@wunderwood.org





how to assign dedicated server for indexing and add more shard in SolrCloud

2012-12-04 Thread Jason
I'm using master and slave server for scaling.
Master is dedicated for indexing and slave is for searching.
Now, I'm planning to move SolrCloud.
It has leader and replicas.
Leader acts like master and replicas acts like slave. Is it right?
so, I'm wondering two things.

First,
How can I assign dedicated server for indexing in SolrCloud?

Second,
Consider I'm using  two shard cluster with shard replicas
http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas
  
and I need to extend one more shard with replicas.
In this case, existing two shards and replicas will already have many docs.
so, I want to add indexing docs in new one only.
How can I do this?

Actually, I don't understand perfectly about SolrCloud.
So, my questions can be ridiculous.
Any inputs are welcome.
Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding filter in solr suggester component.

2012-12-04 Thread sagarzond
Hi
 
   We are using solr (version - 3.6) suggester component for
auto complete. We indexed solr core column 
   (which we want as auto complete result) and its giving me correct
auto complete result. Now I want to add 
   filter on suggester indexed data. 
   
Let say if we have core with userId and notes fileds. We
want add userId filter for auto complete so that 
   it provide notes of that user only during auto complete.
   
   I gone throght following link
   
First Link
http://stackoverflow.com/questions/9004266/solr-spell-check-result-based-filter-query
  
Second Link
http://lucene.472066.n3.nabble.com/Issue-using-filter-query-with-spellCheck-component-td2166322.html
  
   
   Plz help me...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-filter-in-solr-suggester-component-tp4024407.html
Sent from the Solr - User mailing list archive at Nabble.com.