Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
For the first install, I copied over all files in the directory example
into, let's call it, install1. I did the same for install2. The two
installs run on different ports, use different jar files, are not really
related to each other in any way as far as I can see. In particular, they
are not multicore. They have the same access control setup via jetty. I
did a diff on config files and confirmed that only port numbers are
different.

Both had been running fine in parallel importing from a common database for
several weeks. The documents indexed by install1, the problematic one
currently, is a vastly bigger (~2.5B) superset of those indexed by install2
(~250M). 

At this point, select queries on install1 incurs the NullPointerException
irrespective of whether install2 is running or not. The log file looks like
it is indexing normally as always though. The index is also growing at the
usual rate each day. Just select queries fail. :(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990476.html
Sent from the Solr - User mailing list archive at Nabble.com.


Authentication Issue in Shards Query

2012-06-20 Thread tosenthu
Hi

I have a Solr server with 5 Cores, I have modified the Web.xml of solr.war
to have a basic authentication feature enabled for all the web resources.
Also I have written my own Login Module to have the login check. Now when I
query a single core It asks for the User name and password, with proper
credential the query works fine.  But when I use a shard type of Query I get
a 401 error. 

Basically the credential provided to the query is not passed on to shard
queries. Is there a way to overcome this issue via some configurations.

Also the replication is blocked because of authentication.

Please provide me a work arround for this issue.

Regards
Senthil Kumar M R

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Authentication-Issue-in-Shards-Query-tp3990481.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexation Speed?

2012-06-20 Thread Bruno Mannina

Ok thanks for this information,

Le 20/06/2012 05:44, Lance Norskog a écrit :

M. Della Bitta is right- we're not talking about post.jar, but starting Solr:

java -xMx300m -jar start.jar

On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
erickerick...@gmail.com  wrote:

Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems
like it defaults to Integer.MAX_VALUE, so you're fine

And it's all deprecated in 4.x, will be gone

Best
Erick

On Tue, Jun 19, 2012 at 7:07 AM, Bruno Manninabmann...@free.fr  wrote:

Actually -Xmx512m and no effect

Concerning  maxFieldLength, no problem it's commented

Le 19/06/2012 13:02, Erick Erickson a écrit :


Then try -Xmx600M
next try -Xmx900M


etc. The idea is to bump things on separate runs.

But be a little cautious here. Look in your solrconfig.xml file, you'll
see
a commented-out line
maxFieldLength1/maxFieldLength

The default behavior for Solr/Lucene is to index the first 10,000 tokens
(not characters, think of tokens as words for not) in each
document and throw the rest on the floor. At the sizes you're talking
about,
that's probably not a problem, but do be aware of it.

Best
Erick

On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.frwrote:

Like that?

java -Xmx300m -jar post.jar myfile.xml



Le 19/06/2012 11:11, Lance Norskog a écrit :


Ah! Java memory size is a java command line option:


http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html

You would try increasing the memory size in stages up to maybe 300m.

On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr
  wrote:


Le 19/06/2012 10:51, Lance Norskog a écrit :


675 doc/s is respectable for that server. You might move the memory
allocated to Java up and down- there is a balance between amount of
memory in Java v.s. the OS disk buffer.


How can I do that ? is there an option during my command line or in a
config
file?
sorry for this newbie question :(



And, of course, use the latest trunk.

Solr 3.6



On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr
  wrote:

Correction: file size is 40 Mo !!!

Le 19/06/2012 09:09, Bruno Mannina a écrit :


Dear All,

I would like to know if the indexation speed is right.

I have a 40Go file size with around 27 000 docs inside.
I index around 20 fields,

My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
Ram

The file takes 40 seconds with the command line:
java -jar post.jar myfile.xml

Could I increase this speed or reduce this time?

Thanks a lot,
PS: Newbie user









Solr with Tomcat on VPS

2012-06-20 Thread Hill Michael (NHQ-AC)
I am running Solr in a shared Tomcat v5.5.28 (I have access to all
instances) on a Linux VPS server.
When I set it all up, Tomcat starts properly and I can see that it has
accesses my Solr Config directory properly.  

I can access the JSP pages if I reference them directly
(http://mysite.com/solr/admin/index.jsp for example) but access to URL's
like:
 1. http://mysite.com/solr/admin/ 
 2.
http://mysite.com/solr/admin/dataimport.jsp?clean=falsecommit=truecomm
and=full-import 
 3.
http://mysite.com/solr/select/?q=*%3A*version=2.2start=0rows=10inden
t=on

all return 404 errors like URL /solr/select/ was not found on this
server.
I have tried all I can think of and wondering if anyone else has some
thoughts.

This all works great on my development PC where I run the same version
of Tomcat.

Thanks,
Mike


Solr Autosuggest

2012-06-20 Thread Shri Kanish
Hi,
I have a question regarding solr Autosuggest. (If this is not the correct link 
to Post, Please suggest).
 
I have implemented solr Autosuggest with Suggester component. I have read in a 
blog saying, Currently implemented Lookups keep their data in memory, so 
unlike spellchecker data, this data is discarded on core reload and not 
available until you invoke the build command, either explicitly or implicitly 
during a commit.
 
I have a Master-Slave setup. If i add new documents to Master and give commit, 
then suggest would be built( as i gave given buildOnCommit=true). But, when 
replication is done, the Slave would reload the core, At that point, will it 
affect Autosuggestion of the newly added docs.
 
Thanks,
Shri

Re: parameters to decide solr memory consumption

2012-06-20 Thread Erick Erickson
This is really difficult to answer because there are so many variables;
the number of unique terms, whether you store fields or not (which is
really unrelated to memory consumption during searching), etc, etc,
etc. So even trying the index and just looking at the index directory
won't tell you much about memory consumption.

And memory use has been dramatically improved in the 4.x code line, so
anything we can say is actually wrong.

Not to mention that your particular use of caches (filterCache, queryResultCache
etc) will change during runtime.

I'm afraid you'll just have to try it and see.

Yes, LIA is accurate...

Best
Erick

On Tue, Jun 19, 2012 at 8:28 AM, Sachin Aggarwal
different.sac...@gmail.com wrote:
 hello,

 need help regarding how  solr stores the indexes i was reading a article
 that says solr also stores the indexes in same format as explained in
 appendix B of  lucene in action   is it true

 and what parameters do i need to focus on while estimating the memory used
 by my use case

 as i have table like (userid, username, usertime, userlocation, userphn,
 timestamp, address)
 what i believe in my case cardinality of some fields like gender and
 location userphnmodel will be very less will that influence

 any links to read further will b appreciated.

 --

 Thanks  Regards

 Sachin Aggarwal
 7760502772


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
Internal Lucene document IDs are signed 32 bit numbers, so having
2.5B docs seems to be just _asking_ for trouble. Which could
explain the fact that this just came out of thin air. If you kept adding
docs to the problem instance, you wouldn't have changed configs
etc, just added more docs

I really think it's time to shard.

Best
Erick

On Wed, Jun 20, 2012 at 2:15 AM, avenka ave...@gmail.com wrote:
 For the first install, I copied over all files in the directory example
 into, let's call it, install1. I did the same for install2. The two
 installs run on different ports, use different jar files, are not really
 related to each other in any way as far as I can see. In particular, they
 are not multicore. They have the same access control setup via jetty. I
 did a diff on config files and confirmed that only port numbers are
 different.

 Both had been running fine in parallel importing from a common database for
 several weeks. The documents indexed by install1, the problematic one
 currently, is a vastly bigger (~2.5B) superset of those indexed by install2
 (~250M).

 At this point, select queries on install1 incurs the NullPointerException
 irrespective of whether install2 is running or not. The log file looks like
 it is indexing normally as always though. The index is also growing at the
 usual rate each day. Just select queries fail. :(

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990476.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: 3 Way Solr Join . . ?

2012-06-20 Thread Sabeer Hussain
I have a similar situation in my application. I have five different entities.
The relationships among entities as follows

Protocol -- ( zero or more) Study -- (  zero or more) Patient
Protocol -- ( zero or more) Drug
Patient -- (zero or more) Study
Form -- (zero or many) Study

Moreover, all these entities can be exist independently also (as per the
requirement of my application). So, I cannot create a document to include
all these entities using demoralization.  If I need to find out the Drug
Name (from Drug entity), Protocol Name (from Protocol entity), Study Name
(from Study entity), Patient Name (from Patient entity) and Form Name ( from
Form entity) based on Drug Batch Number (from Drug entity) I passed. Using
Join in Solr, I can get either child or parent not from both. What is the
best way to index the data in Solr? Do I need to create separate indices for
each entity or single one for all


--
View this message in context: 
http://lucene.472066.n3.nabble.com/3-Way-Solr-Join-tp3815979p3990515.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Erick, thanks for pointing that out. I was going to say in my original post
that it is almost like some limit on max documents got violated all of a
sudden, but the rest of the symptoms didn't seem to quite match. But now
that I think about it, the problem probably happened at 2B (corresponding
exactly to the size of the signed int space) as my ID space in the database
has roughly 85% holes and the problem probably happened when the ID hit
around 2.4B. 

It is still odd that indexing appears to proceed normally and the select
queries know which IDs are used because the error happens only for queries
with non-empty results, e.g., searching for an ID that doesn't exist gives a
valid 0 numResponses response. Is this because solr uses 'long' or more
for indexing (given that the schema supports long) but not in the querying
modules?

I hadn't used solr sharding because I really needed rolling partitions,
where I keep a small index of recent documents and throw the rest into a
slow archive index. So maintaining the smaller instance2 (usually  50M)
and replicating it if needed was my homebrewed sharding approach. But I
guess it is time to shard the archive after all.

AV

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990534.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema / Config Error?

2012-06-20 Thread Jan Høydahl
As I understand, James is not upgrading, but trying to start a fresh downloaded 
3.6.0.

James, can you provide some more details, especially, which AppServer are you 
using, how did you start Solr... Can you copy/paste the error msg from your log 
files?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 6. juni 2012, at 13:33, Jack Krupansky wrote:

 Read CHANGES.txt carefully, especially the section entitled Upgrading from 
 Solr 3.5. For example,
 
 * As of Solr 3.6, the indexDefaults and mainIndex sections of 
 solrconfig.xml are deprecated
 and replaced with a new indexConfig section. Read more in SOLR-1052 below.
 
 If you simply copied your schema/config directly, unchanged, then this could 
 be the problem.
 
 You may need to compare your schema/config line-by-line to the new 3.6 
 schema/config for any differences.
 
 -- Jack Krupansky
 
 -Original Message- From: Erick Erickson
 Sent: Wednesday, June 06, 2012 6:57 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Schema / Config Error?
 
 That implies one of two things:
 1 you changed solr.xml. I'd go back to the original and re-edit
 anything you've changed
 2 you somehow got a corrupted download. Try blowing your installation
 away and getting a new copy
 
 Because it works perfectly for me.
 
 Best
 Erick
 
 On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote:
 Hi,
 
 I installed a fresh copy of Solr 3.6.0 or my server but I get the following
 page when I try to access Solr:
 
 http://176.58.103.78:8080/solr/
 
 It says errors to do with my Solr.xml. This is my solr.xml:
 
 
 
 I really cant figure out how I am meant to fix this, so if anyone is able to
 give some input I would really appreciate it.
 
 James
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html
 Sent from the Solr - User mailing list archive at Nabble.com. 
 



Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
Let's make sure we're talking about the same thing. Solr happily
indexes and stores long (64) bit values, no problem. What it doesn't
do is assign _internal_ documents IDs as longs, those are ints.

on admin/statistics, look at maxDocs and numDocs. maxDocs +1 will be the
next _internal_ lucene doc id assigned, so if that's wonky or  2B, this
is where the rub happens. BTW, the difference between numDocs and
maxDocs is the number of documents deleted from your index. If your number
of current documents is much smaller than 2B, you can get maxDocs
to equal numDocs if you optimize, and get yourself some more headroom.
whether your index will be OK I'm not prepared to guarantee though...

But if I'm reading your notes correctly, the 85% holes applies to a value in
your document, and has nothing to do with the internal lucene ID issue.

But internally, the int limit isn't robustly enforced, so I'm not
surprised that it
pops out (if, indeed, this is your problem) in odd places.

Best
Erick

On Wed, Jun 20, 2012 at 10:02 AM, avenka ave...@gmail.com wrote:
 Erick, thanks for pointing that out. I was going to say in my original post
 that it is almost like some limit on max documents got violated all of a
 sudden, but the rest of the symptoms didn't seem to quite match. But now
 that I think about it, the problem probably happened at 2B (corresponding
 exactly to the size of the signed int space) as my ID space in the database
 has roughly 85% holes and the problem probably happened when the ID hit
 around 2.4B.

 It is still odd that indexing appears to proceed normally and the select
 queries know which IDs are used because the error happens only for queries
 with non-empty results, e.g., searching for an ID that doesn't exist gives a
 valid 0 numResponses response. Is this because solr uses 'long' or more
 for indexing (given that the schema supports long) but not in the querying
 modules?

 I hadn't used solr sharding because I really needed rolling partitions,
 where I keep a small index of recent documents and throw the rest into a
 slow archive index. So maintaining the smaller instance2 (usually  50M)
 and replicating it if needed was my homebrewed sharding approach. But I
 guess it is time to shard the archive after all.

 AV

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990534.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexation Speed?

2012-06-20 Thread Bruno Mannina

Little question please:

I have directories with around 30 files of 40Mo with around 17 000 doc 
for each files.


is it better to index:
- file by file with java -jar 1.xml, java -jar 2.xml, etc
or
- all at the same time with java -jar *.xml

All files are verified, so my question is just concerning speed

Thx for your comments,
Bruno


Le 20/06/2012 05:44, Lance Norskog a écrit :

M. Della Bitta is right- we're not talking about post.jar, but starting Solr:

java -xMx300m -jar start.jar

On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
erickerick...@gmail.com  wrote:

Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems
like it defaults to Integer.MAX_VALUE, so you're fine

And it's all deprecated in 4.x, will be gone

Best
Erick

On Tue, Jun 19, 2012 at 7:07 AM, Bruno Manninabmann...@free.fr  wrote:

Actually -Xmx512m and no effect

Concerning  maxFieldLength, no problem it's commented

Le 19/06/2012 13:02, Erick Erickson a écrit :


Then try -Xmx600M
next try -Xmx900M


etc. The idea is to bump things on separate runs.

But be a little cautious here. Look in your solrconfig.xml file, you'll
see
a commented-out line
maxFieldLength1/maxFieldLength

The default behavior for Solr/Lucene is to index the first 10,000 tokens
(not characters, think of tokens as words for not) in each
document and throw the rest on the floor. At the sizes you're talking
about,
that's probably not a problem, but do be aware of it.

Best
Erick

On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.frwrote:

Like that?

java -Xmx300m -jar post.jar myfile.xml



Le 19/06/2012 11:11, Lance Norskog a écrit :


Ah! Java memory size is a java command line option:


http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html

You would try increasing the memory size in stages up to maybe 300m.

On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr
  wrote:


Le 19/06/2012 10:51, Lance Norskog a écrit :


675 doc/s is respectable for that server. You might move the memory
allocated to Java up and down- there is a balance between amount of
memory in Java v.s. the OS disk buffer.


How can I do that ? is there an option during my command line or in a
config
file?
sorry for this newbie question :(



And, of course, use the latest trunk.

Solr 3.6



On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr
  wrote:

Correction: file size is 40 Mo !!!

Le 19/06/2012 09:09, Bruno Mannina a écrit :


Dear All,

I would like to know if the indexation speed is right.

I have a 40Go file size with around 27 000 docs inside.
I index around 20 fields,

My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go
Ram

The file takes 40 seconds with the command line:
java -jar post.jar myfile.xml

Could I increase this speed or reduce this time?

Thanks a lot,
PS: Newbie user









Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Yes, wonky indeed. 
  numDocs : -2006905329
  maxDoc : -1993357870 

And yes, I meant that the holes are in the database auto-increment ID space,
nothing to do with lucene IDs.

I will set up sharding. But is there any way to retrieve most of the current
index? Currently, all select queries even in ranges in the hundreds of
millions return the NullPointerException. It would suck to lose all of this.
:(

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Malay Language Detection

2012-06-20 Thread Rohit
Hi,

 

We are using http://code.google.com/p/language-detection/  along with Solr
for language detection, but it seems that the following jar doesn't have
support for Malay detection.

 

So, I created the profile for malay which is used by the jar, this works in
local test environment, but I don't know how to get it to work with Solr.
Has anyone else worked on this earlier?

 

 

Regards,

Rohit

 



How to import this Json-line by DIH?

2012-06-20 Thread jueljust


--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrj and replication

2012-06-20 Thread tom

hi,

i was just wondering if i need to do smth special if i want to have an 
embedded slave to get replication working ?


my setup is like so:
- in my clustered application that uses embedded solr(j) (for 
performance). the cores are configured as slaves that should connect to 
a master which runs in a jetty.

- the embedded codes dont expose any of the solr servlets

note: that the slave config, if started in jetty, does proper 
replication, while when embedded it doesnt.


using solr 3.5

thx

tom


Re: Indexation Speed?

2012-06-20 Thread Erick Erickson
I doubt you'll find any significant difference in indexing speed. But the
post.jar file is really intended as a demo program to quickly get the
examples working. It was never intended to be a production-ready
program. I'd think about using something like SolrJ etc. to index the docs.

And I'm assuming your documents are in the approved Solr format, somthing
like
add
doc
  field name=myfieldvalue for field/field
.
.
/doc
doc
   .
   .
   .
/doc
/add

solr will not index arbitrary XML. If you're trying to do this, you'll
need to transform
your arbitrary XML into the above format, consider SolrJ or something
like that in
this case.

Best
Erick

On Wed, Jun 20, 2012 at 10:40 AM, Bruno Mannina bmann...@free.fr wrote:
 Little question please:

 I have directories with around 30 files of 40Mo with around 17 000 doc for
 each files.

 is it better to index:
 - file by file with java -jar 1.xml, java -jar 2.xml, etc
 or
 - all at the same time with java -jar *.xml

 All files are verified, so my question is just concerning speed

 Thx for your comments,
 Bruno



 Le 20/06/2012 05:44, Lance Norskog a écrit :

 M. Della Bitta is right- we're not talking about post.jar, but starting
 Solr:


 java -xMx300m -jar start.jar

 On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson
 erickerick...@gmail.com  wrote:

 Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's
 seems
 like it defaults to Integer.MAX_VALUE, so you're fine

 And it's all deprecated in 4.x, will be gone

 Best
 Erick

 On Tue, Jun 19, 2012 at 7:07 AM, Bruno Manninabmann...@free.fr  wrote:

 Actually -Xmx512m and no effect

 Concerning  maxFieldLength, no problem it's commented

 Le 19/06/2012 13:02, Erick Erickson a écrit :

 Then try -Xmx600M
 next try -Xmx900M


 etc. The idea is to bump things on separate runs.

 But be a little cautious here. Look in your solrconfig.xml file, you'll
 see
 a commented-out line
 maxFieldLength1/maxFieldLength

 The default behavior for Solr/Lucene is to index the first 10,000
 tokens
 (not characters, think of tokens as words for not) in each
 document and throw the rest on the floor. At the sizes you're talking
 about,
 that's probably not a problem, but do be aware of it.

 Best
 Erick

 On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.fr
  wrote:

 Like that?

 java -Xmx300m -jar post.jar myfile.xml



 Le 19/06/2012 11:11, Lance Norskog a écrit :

 Ah! Java memory size is a java command line option:



 http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html

 You would try increasing the memory size in stages up to maybe 300m.

 On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr
  wrote:


 Le 19/06/2012 10:51, Lance Norskog a écrit :

 675 doc/s is respectable for that server. You might move the memory
 allocated to Java up and down- there is a balance between amount of
 memory in Java v.s. the OS disk buffer.


 How can I do that ? is there an option during my command line or in
 a
 config
 file?
 sorry for this newbie question :(


 And, of course, use the latest trunk.

 Solr 3.6


 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr
  wrote:

 Correction: file size is 40 Mo !!!

 Le 19/06/2012 09:09, Bruno Mannina a écrit :

 Dear All,

 I would like to know if the indexation speed is right.

 I have a 40Go file size with around 27 000 docs inside.
 I index around 20 fields,

 My (old) test server is a DualCore 3.06GHz Intel Xeon with only
 1Go
 Ram

 The file takes 40 seconds with the command line:
 java -jar post.jar myfile.xml

 Could I increase this speed or reduce this time?

 Thanks a lot,
 PS: Newbie user







Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
That indeed sucks. But I don't personally know of a good way to
try to split apart an existing index into shards. I'm afraid you're
going to be stuck with re-indexing

Wish I had a better solution
Erick

On Wed, Jun 20, 2012 at 10:45 AM, avenka ave...@gmail.com wrote:
 Yes, wonky indeed.
  numDocs : -2006905329
  maxDoc : -1993357870

 And yes, I meant that the holes are in the database auto-increment ID space,
 nothing to do with lucene IDs.

 I will set up sharding. But is there any way to retrieve most of the current
 index? Currently, all select queries even in ranges in the hundreds of
 millions return the NullPointerException. It would suck to lose all of this.
 :(

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990542.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Thanks. Do you know if the tons of index files with names like '_zxt.tis' in
the index/data/ directory have the lucene IDs embedded in the binaries? The
files look good to me and are partly readable even if in binary. I am
wondering if I could just set up a new solr instance and move these index
files there and hope to use them (or most of them) as is without shards? If
so, I will just set up a separate sharded index for the documents indexed
henceforth, but won't bother splitting the huge existing index.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990560.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread Erick Erickson
Don't even try to do that. First of all, you have to have a reliable way to
index the same docs to the same shards. The docs are all mixed up
in the segment files and would lead to chaos. Solr/Lucene report
the same doc multiple times if it's indifferent shards, so if you
ever updated a document, you wouldn't know what shard to
send it to.

Second, the segments are all parts of a single index, and Solr expects
(well, actually Lucene) expects them to be consistent. Putting some on
one shard and some on another would probably not allow Solr to start
(but I confess I've never tried that).

So I really wouldn't even try to go there.

Best
Erick

On Wed, Jun 20, 2012 at 12:35 PM, avenka ave...@gmail.com wrote:
 Thanks. Do you know if the tons of index files with names like '_zxt.tis' in
 the index/data/ directory have the lucene IDs embedded in the binaries? The
 files look good to me and are partly readable even if in binary. I am
 wondering if I could just set up a new solr instance and move these index
 files there and hope to use them (or most of them) as is without shards? If
 so, I will just set up a separate sharded index for the documents indexed
 henceforth, but won't bother splitting the huge existing index.


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990560.html
 Sent from the Solr - User mailing list archive at Nabble.com.


write.lock

2012-06-20 Thread Christopher Gross
I'm running Solr 3.4.  The past 2 months I've been getting a lot of
write.lock errors.  I switched to the simple lockType (and made it
clear the lock on restart), but my index is still locking up a few
times a week.

I can't seem to determine what is causing the locks -- does anyone out
there have any ideas/experience as to what is causing the locks, and
what config changes that I can make in order to prevent the lock?

Any help would be very appreciated!

-- Chris


Help with Solr File Based spell check

2012-06-20 Thread Sanjay Dua - Network
Hi,

We are trying to implement file based search in our application using Solr 1.4. 
This is the code we have written

-http://lllydevvm02.sixfeetup.com:8983/solr/admin/file/?file=solrconfig.xml 
searchComponent name=spellcheck class=solr.SpellCheckComponent
-http://lllydevvm02.sixfeetup.com:8983/solr/admin/file/?file=solrconfig.xml 
lst name=spellchecker
  str name=namedefault/str
  str name=classnamesolr.FileBasedSpellChecker/str
  str 
name=sourceLocation/usr/home/lilly/sixfeetup/projects/alm-buildout/etc/solr/spelling.txt/str
  str name=spellcheckIndexDir./filespellchecker/str
  str name=accuracy0.7/str
  /lst
  str name=queryAnalyzerFieldTypetext/str
  /searchComponent


We are facing a issue and need your help on the same.

When the user searches for a word medicine, which is a correct word and is 
present in the dictionary. We still get a suggestion medicines from 
dictionary.

We only want suggestion if the word is incorrectly spelled or is not included 
in the dictionary.

Can you please provide some suggestions.

Regards,
Sanjay Dua


Re: LanguageDetection inside of ExtractingRequestHandler

2012-06-20 Thread Jan Høydahl
Hi,

In my opinion, instead of hardcoding such functionality into multiple request 
handlers, we should go the opposite direction - modularization, factoring out 
Tika extraction into its own UpdateProcessor 
(https://issues.apache.org/jira/browse/SOLR-1763). Then the 
ExtractingRequestHandler would eventually go away, and you could use it and 
language detection with any Request Handler you choose, including XML and DIH...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 19. juni 2012, at 17:10, Martin Ruckli wrote:

 Hi all,
 
 I just wanted to check if there is a demand for this feature. I had to 
 implement this functionality for one of our customers and would like to 
 contribute it.
 
 Here is the use case:
 We are using the ExtractingRequestHandler with the extractOnly=true flag set.
 With a request to this handler we get the content of a posted document like 
 we want to. We would also like to detect the language and return it as a 
 metadata field in the response from solr.
 As there is already support for LanguageDetection based on tika integrated 
 into solr, the only thing what I did was add a new param to enable or disable 
 this feature and then do the language detection nearly the same way as it is 
 done in the TikaLanguageIdentifierUpdateProcessor
 I think this would be a nice addition, especially in the extractOnly mode.
 
 What are your thoughts on this?
 
 Cheers
 Martin
 



Exception using distributed field-collapsing

2012-06-20 Thread Bryan Loofbourrow
I am doing a search on three shards with identical schemas (I
double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
giving me back the exception listed at the bottom of this email:



Other information:



My schema uses the following field types: StrField, DateField,
TrieDateField, TextField, SortableInt, SortableLong, BoolField



My query looks like this (I’ve messed with it to anonymize but, I hope,
kept the essentials:



http://[solr core2] /select/?start=0rows=25q={!qsol}machinessort=[sort
field] fl=[list of fields] shards=[solr core1]%2c[solr core2]%2c[solr
core3]group=truegroup.field=[group field]



java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.String

at 
org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154)

at java.util.TreeMap.put(TreeMap.java:547)

at java.util.TreeSet.add(TreeSet.java:255)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222)

at 
org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285)

at 
org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340)

at 
org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77)

at 
org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565)

at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548)

at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)

at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)

at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)

at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)

at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:679)



Any thoughts or advice?



Thanks,



-- Bryan


Re: Exception using distributed field-collapsing

2012-06-20 Thread Martijn v Groningen
Hi Bryan,

What is the fieldtype of the groupField? You can only group by field
that is of type string as is described in the wiki:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

When you group by another field type a http 400 should be returned
instead if this error. At least that what I'd expect.

Martijn

On 20 June 2012 20:37, Bryan Loofbourrow
bloofbour...@knowledgemosaic.com wrote:
 I am doing a search on three shards with identical schemas (I
 double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is
 giving me back the exception listed at the bottom of this email:



 Other information:



 My schema uses the following field types: StrField, DateField,
 TrieDateField, TextField, SortableInt, SortableLong, BoolField



 My query looks like this (I’ve messed with it to anonymize but, I hope,
 kept the essentials:



 http://[solr core2] /select/?start=0rows=25q={!qsol}machinessort=[sort
 field] fl=[list of fields] shards=[solr core1]%2c[solr core2]%2c[solr
 core3]group=truegroup.field=[group field]



 java.lang.ClassCastException: java.util.Date cannot be cast to 
 java.lang.String

        at 
 org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154)

        at java.util.TreeMap.put(TreeMap.java:547)

        at java.util.TreeSet.add(TreeSet.java:255)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222)

        at 
 org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285)

        at 
 org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340)

        at 
 org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77)

        at 
 org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565)

        at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548)

        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289)

        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)

        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)

        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)

        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)

        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)

        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)

        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

        at java.lang.Thread.run(Thread.java:679)



 Any thoughts or advice?



 Thanks,



 -- Bryan



-- 
Met vriendelijke groet,

Martijn van Groningen


RE: Exception using distributed field-collapsing

2012-06-20 Thread Bryan Loofbourrow
 Hi Bryan,

 What is the fieldtype of the groupField? You can only group by field
 that is of type string as is described in the wiki:
 http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

 When you group by another field type a http 400 should be returned
 instead if this error. At least that what I'd expect.

 Martijn

Martijn,

The group-by field is a string. I have been unable to figure how a date
comes into the picture at all, and have basically been wondering if there
is some problem in the grouping code that misaligns the field values from
different results in the group, so that it is not comparing like with
like. Not a strong theory, just the only thing I can think of.

-- Bryan


Re: Indexation Speed?

2012-06-20 Thread Bruno Mannina

Hi Erick,


I doubt you'll find any significant difference in indexing speed. But the
post.jar file is really intended as a demo program to quickly get the
examples working. It was never intended to be a production-ready
program. I'd think about using something like SolrJ etc. to index the docs.


ah?! I don't know yet SolrJ :(
I need to know how to program in java?

I transformed all my xml source files to the xml structure below and I'm 
using post.jar

I thought it was (post.jar) a standard tool to index docs.


And I'm assuming your documents are in the approved Solr format, somthing
like
add
doc
   field name=myfieldvalue for field/field
 .
 .
/doc
doc
.
.
.
/doc
/add

Yes all my xml docs have this format.


solr will not index arbitrary XML. If you're trying to do this, you'll
need to transform
your arbitrary XML into the above format, consider SolrJ or something
like that in
this case.


If all my xml docs are in the xml structure above, is it necessary to 
use SolrJ ?





RE: How to import this Json-line by DIH?

2012-06-20 Thread Steven A Rowe
Hi jueljust,

Nabble removed the entire content of your email before sending it to the 
mailing list.

Maybe use a different service that doesn't throw away your message?

Steve


From: jueljust [juelj...@gmail.com]
Sent: Wednesday, June 20, 2012 10:56 AM
To: solr-user@lucene.apache.org
Subject: How to import this Json-line by DIH?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexation Speed?

2012-06-20 Thread Erik Hatcher
I think it's a bit of an it depends on whether post.jar is the Right choice 
for production. 

It -is- SolrJ inside after all, Erick :) and it's pretty much the same as using 
curl. Just be sure you control commits as needed. 

Erik

On Jun 20, 2012, at 15:18, Bruno Mannina bmann...@free.fr wrote:

 Hi Erick,
 
 I doubt you'll find any significant difference in indexing speed. But the
 post.jar file is really intended as a demo program to quickly get the
 examples working. It was never intended to be a production-ready
 program. I'd think about using something like SolrJ etc. to index the docs.
 
 ah?! I don't know yet SolrJ :(
 I need to know how to program in java?
 
 I transformed all my xml source files to the xml structure below and I'm 
 using post.jar
 I thought it was (post.jar) a standard tool to index docs.
 
 And I'm assuming your documents are in the approved Solr format, somthing
 like
 add
 doc
   field name=myfieldvalue for field/field
 .
 .
 /doc
 doc
.
.
.
 /doc
 /add
 Yes all my xml docs have this format.
 
 solr will not index arbitrary XML. If you're trying to do this, you'll
 need to transform
 your arbitrary XML into the above format, consider SolrJ or something
 like that in
 this case.
 
 If all my xml docs are in the xml structure above, is it necessary to use 
 SolrJ ?
 
 


Re: solr java.lang.NullPointerException on select queries

2012-06-20 Thread avenka
Erick, thanks for the advice, but let me make sure you haven't misunderstood
what I was asking.

I am not trying to split the huge existing index in install1 into shards. I
am also not trying to make the huge install1 index as one shard of a sharded
solr setup. I plan to use a sharded setup only for future docs.

I do want to avoid trying to re-index the docs in install1 and think of them
as a slow tape archive index server if I ever need to go and query the
past documents. So I was wondering if I could somehow use the existing
segment files to run an isolated (unsharded) solr server that lets me query
roughly the first 2B docs before the wraparound problem happened. If the
negative internal doc IDs have pervasively corrupted the segment files,
this would not be possible, but I am not able to imagine an underlying
lucene design that would cause such a problem. Is my only option to re-index
the past 2B docs if I want to be able to query them at this point or is
there any way to use the existing segment files?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990615.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apache Lucene Eurocon 2012

2012-06-20 Thread Lance Norskog
Hello Mikhail-

Your mail did not come through.

Hope things are well,

Lance Norskog
Lucid Imagination

On Wed, Jun 20, 2012 at 11:16 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 up

 --
 Sincerely yours
 Mikhail Khludnev
 Tech Lead
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



-- 
Lance Norskog
goks...@gmail.com


Re: Editing solr update handler sub class

2012-06-20 Thread Shameema Umer
Can anybody tell me where are the lucene jar files
org.apache.lucene.index and org.apache.lucene.search located?

Thanks
Shameema

On Wed, Jun 20, 2012 at 4:44 PM, Shameema Umer shem...@gmail.com wrote:
 Hi,

 I decompiled DirectUpdateHandler2.class to .java file and edited it to
 suit my requirement to stop overwriting duplicates(I needed the first
 fetched tstamp).
 But when I tried to compile it to .class file, it shows 91 errors. Am
 I wrong anywhere?

 I am new to java application but fluent in web languages.

 Please help.

 Thanks
 Shameema


Re: Editing solr update handler sub class

2012-06-20 Thread irshad siddiqui
 Hi,

Jar file are  located in dist folder . check ur dist folder or you can
check your solrconfig.xml  file where you will get jar location path.


On Thu, Jun 21, 2012 at 9:47 AM, Shameema Umer shem...@gmail.com wrote:

 Can anybody tell me where are the lucene jar files
 org.apache.lucene.index and org.apache.lucene.search located?

 Thanks
 Shameema

 On Wed, Jun 20, 2012 at 4:44 PM, Shameema Umer shem...@gmail.com wrote:
  Hi,
 
  I decompiled DirectUpdateHandler2.class to .java file and edited it to
  suit my requirement to stop overwriting duplicates(I needed the first
  fetched tstamp).
  But when I tried to compile it to .class file, it shows 91 errors. Am
  I wrong anywhere?
 
  I am new to java application but fluent in web languages.
 
  Please help.
 
  Thanks
  Shameema



Re: parameters to decide solr memory consumption

2012-06-20 Thread Sachin Aggarwal
thanks for help


hey
I tried some exercise
I m storing schema (uuid,key, userlocation)
uuid and key are unique and user location have cardinality as 150
uuid and key are stored and indexed while userlocation is indexed not
stored.
still the index directory size is 51 MB just for 200,000 records don't u
think its not optimal
what if i go for billions of records.

-- 

Thanks  Regards

Sachin Aggarwal
7760502772