solr wiki

2018-11-29 Thread Gary Sieling
Can I be added to the Solr wiki contributors list?

Username: garysieling

Thanks
Gary


Re: [SolrCloud] shard hash ranges changed after restoring backup

2016-06-16 Thread Gary Yao
Hi Erick,

I should add that our Solr cluster is in production and new documents
are constantly indexed. The new cluster has been up for three weeks now.
The problem was discovered only now because in our use case Atomic
Updates and RealTime Gets are mostly performed on new documents. With
almost absolute certainty there are already documents in the index that
were distributed to the shards according to the new hash ranges. If we
just changed the hash ranges in ZooKeeper, the index would still be in
an inconsistent state.

Is there any way to recover from this without having to re-index all
documents?

Best,
Gary

2016-06-15 19:23 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
> Simplest, though a bit risky is to manually edit the znode and
> correct the znode entry. There are various tools out there, including
> one that ships with Zookeeper (see the ZK documentation).
>
> Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
> down to your local machine, edit it there and then push it back up to ZK.
>
> I'd do all this with my Solr nodes shut down, then insure that my ZK
> ensemble was consistent after the update etc
>
> Best,
> Erick
>
> On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao <gary@zalando.de> wrote:
>> Hi all,
>>
>> My team at work maintains a SolrCloud 5.3.2 cluster with multiple
>> collections configured with sharding and replication.
>>
>> We recently backed up our Solr indexes using the built-in backup
>> functionality. After the cluster was restored from the backup, we
>> noticed that atomic updates of documents are failing occasionally with
>> the error message 'missing required field [...]'. The exceptions are
>> thrown on a host on which the document to be updated is not stored. From
>> this we are deducing that there is a problem with finding the right host
>> by the hash of the uniqueKey. Indeed, our investigations so far showed
>> that for at least one collection in the new cluster, the shards have
>> different hash ranges assigned now. We checked the hash ranges by
>> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
>> hash ranges of one collection that we debugged.
>>
>>   Old cluster:
>> shard1_0 8000 - aaa9
>> shard1_1  - d554
>> shard2_0 d555 - fffe
>> shard2_1  - 2aa9
>> shard3_0 2aaa - 5554
>> shard3_1  - 7fff
>>
>>   New cluster:
>> shard1 8000 - aaa9
>> shard2  - d554
>> shard3 d555 - 
>> shard4 0 - 2aa9
>> shard5 2aaa - 5554
>> shard6  - 7fff
>>
>>   Note that the shard names differ because the old cluster's shards were
>>   split.
>>
>> As you can see, the ranges of shard3 and shard4 differ from the old
>> cluster. This change of hash ranges matches with the symptoms we are
>> currently experiencing.
>>
>> We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
>> in which David Smiley comments:
>>
>>   shard hash ranges aren't restored; this error could be disasterous
>>
>> It seems that this is what happened to us. We would like to hear some
>> suggestions on how we could recover from this problem.
>>
>> Best,
>> Gary


[SolrCloud] shard hash ranges changed after restoring backup

2016-06-15 Thread Gary Yao
Hi all,

My team at work maintains a SolrCloud 5.3.2 cluster with multiple
collections configured with sharding and replication.

We recently backed up our Solr indexes using the built-in backup
functionality. After the cluster was restored from the backup, we
noticed that atomic updates of documents are failing occasionally with
the error message 'missing required field [...]'. The exceptions are
thrown on a host on which the document to be updated is not stored. From
this we are deducing that there is a problem with finding the right host
by the hash of the uniqueKey. Indeed, our investigations so far showed
that for at least one collection in the new cluster, the shards have
different hash ranges assigned now. We checked the hash ranges by
querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
hash ranges of one collection that we debugged.

  Old cluster:
shard1_0 8000 - aaa9
shard1_1  - d554
shard2_0 d555 - fffe
shard2_1  - 2aa9
shard3_0 2aaa - 5554
shard3_1  - 7fff

  New cluster:
shard1 8000 - aaa9
shard2  - d554
shard3 d555 - 
shard4 0 - 2aa9
shard5 2aaa - 5554
shard6  - 7fff

  Note that the shard names differ because the old cluster's shards were
  split.

As you can see, the ranges of shard3 and shard4 differ from the old
cluster. This change of hash ranges matches with the symptoms we are
currently experiencing.

We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
in which David Smiley comments:

  shard hash ranges aren't restored; this error could be disasterous

It seems that this is what happened to us. We would like to hear some
suggestions on how we could recover from this problem.

Best,
Gary


Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-27 Thread Gary Taylor

Alex,

I've created JIRA ticket: https://issues.apache.org/jira/browse/SOLR-7174

In response to your suggestions below:

1. No exceptions are reported, even with onError removed.
2. ProcessMonitor shows only the very first epub file is being read 
(repeatedly)

3. I can repeat this on Ubuntu (14.04) by following the same steps.
4. Ticket raised (https://issues.apache.org/jira/browse/SOLR-7174)

Additionally (and I've added this on the ticket), if I change the 
dataConfig to use FileDataSource and PlainTextEntityProcessor, and just 
list *.txt files, it works!


dataConfig
dataSource type=FileDataSource name=bin /
document
entity name=files dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=c:/Users/gt/Documents/HackerMonthly/epub 
fileName=.*txt

field column=fileAbsolutePath name=id /
field column=fileSize name=size /
field column=fileLastModified name=lastModified /

entity name=documentImport 
processor=PlainTextEntityProcessor
url=${files.fileAbsolutePath} format=text 
dataSource=bin

field column=plainText name=content/
/entity
/entity
/document
/dataConfig

So it's something related to BinFileDataSource and TikaEntityProcessor.

Thanks,
Gary.

On 26/02/2015 14:24, Gary Taylor wrote:

Alex,

That's great.  Thanks for the pointers.  I'll try and get more info on 
this and file a JIRA issue.


Kind regards,
Gary.

On 26/02/2015 14:16, Alexandre Rafalovitch wrote:

On 26 February 2015 at 08:32, Gary Taylor g...@inovem.com wrote:

Alex,

Same results on recursive=true / recursive=false.

I also tried importing plain text files instead of epub (still using
TikeEntityProcessor though) and get exactly the same result - ie. 
all files

fetched, but only one document indexed in Solr.

To me, this would indicate that something is a problem with the inner
DIH entity then. As a next set of steps, I would probably
1) remove both onError statements and see if there is an exception
that is being swallowed.
2) run the import under ProcessMonitor and see if the other files are
actually being read
https://technet.microsoft.com/en-us/library/bb896645.aspx
3) Assume a Windows bug and test this on Mac/Linux
4) File a JIRA with a replication case. If there is a full replication
setup, I'll test it machines I have access to with full debugger
step-through

For example, I wonder if FileBinDataSource is somehow not cleaning up
after the first file properly on Windows and fails to open the second
one.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/





--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor

Alex,

Same results on recursive=true / recursive=false.

I also tried importing plain text files instead of epub (still using 
TikeEntityProcessor though) and get exactly the same result - ie. all 
files fetched, but only one document indexed in Solr.


With verbose output, I get a row for each file in the directory, but 
only the first one has a non-empty documentImport entity.   All 
subsequent documentImport entities just have an empty document#2 entry.  eg:


 
  verbose-output: [
entity:files,
[
  null,
  --- row #1-,
  fileSize,
  2609004,
  fileLastModified,
  2015-02-25T11:37:25.217Z,
  fileAbsolutePath,
  c:\\Users\\gt\\Documents\\epub\\issue018.epub,
  fileDir,
  c:\\Users\\gt\\Documents\\epub,
  file,
  issue018.epub,
  null,
  -,
  entity:documentImport,
  [
document#1,
[
  query,
  c:\\Users\\gt\\Documents\\epub\\issue018.epub,
  time-taken,
  0:0:0.0,
  null,
  --- row #1-,
  text,
   ... parsed epub text - snip ... 
  title,
  Issue 18 title,
  Author,
  Author text,
  null,
  -
],
document#2,
[]
  ],
  null,
  --- row #2-,
  fileSize,
  4428804,
  fileLastModified,
  2015-02-25T11:37:36.399Z,
  fileAbsolutePath,
  c:\\Users\\gt\\Documents\\epub\\issue019.epub,
  fileDir,
  c:\\Users\\gt\\Documents\\epub,
  file,
  issue019.epub,
  null,
  -,
  entity:documentImport,
  [
document#2,
[]
  ],
  null,
  --- row #3-,
  fileSize,
  2580266,
  fileLastModified,
  2015-02-25T11:37:41.188Z,
  fileAbsolutePath,
  c:\\Users\\gt\\Documents\\epub\\issue020.epub,
  fileDir,
  c:\\Users\\gt\\Documents\\epub,
  file,
  issue020.epub,
  null,
  -,
  entity:documentImport,
  [
document#2,
[]
  ],






Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor

Alex,

That's great.  Thanks for the pointers.  I'll try and get more info on 
this and file a JIRA issue.


Kind regards,
Gary.

On 26/02/2015 14:16, Alexandre Rafalovitch wrote:

On 26 February 2015 at 08:32, Gary Taylor g...@inovem.com wrote:

Alex,

Same results on recursive=true / recursive=false.

I also tried importing plain text files instead of epub (still using
TikeEntityProcessor though) and get exactly the same result - ie. all files
fetched, but only one document indexed in Solr.

To me, this would indicate that something is a problem with the inner
DIH entity then. As a next set of steps, I would probably
1) remove both onError statements and see if there is an exception
that is being swallowed.
2) run the import under ProcessMonitor and see if the other files are
actually being read
https://technet.microsoft.com/en-us/library/bb896645.aspx
3) Assume a Windows bug and test this on Mac/Linux
4) File a JIRA with a replication case. If there is a full replication
setup, I'll test it machines I have access to with full debugger
step-through

For example, I wonder if FileBinDataSource is somehow not cleaning up
after the first file properly on Windows and fails to open the second
one.

Regards,
Alex.


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/



--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor
I can't get the FileListEntityProcessor and TikeEntityProcessor to 
correctly add a Solr document for each epub file in my local directory.


I've just downloaded Solr 5.0.0, on a Windows 7 PC.   I ran solr start 
and then solr create -c hn2 to create a new core.


I want to index a load of epub files that I've got in a directory. So I 
created a data-import.xml (in solr\hn2\conf):


dataConfig
dataSource type=BinFileDataSource name=bin /
document
entity name=files dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=c:/Users/gt/Documents/epub fileName=.*epub
onError=skip
recursive=true
field column=fileAbsolutePath name=id /
field column=fileSize name=size /
field column=fileLastModified name=lastModified /

entity name=documentImport processor=TikaEntityProcessor
url=${files.fileAbsolutePath} format=text 
dataSource=bin onError=skip

field column=file name=fileName/
field column=Author name=author meta=true/
field column=title name=title meta=true/
field column=text name=content/
/entity
/entity
/document
/dataConfig

In my solrconfig.xml, I added a requestHandler entry to reference my 
data-import.xml:


  requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler

  lst name=defaults
  str name=configdata-import.xml/str
  /lst
  /requestHandler

I renamed managed-schema to schema.xml, and ensured the following doc 
fields were setup:


  field name=id type=string indexed=true stored=true 
required=true multiValued=false /

  field name=fileName type=string indexed=true stored=true /
  field name=author type=string indexed=true stored=true /
  field name=title type=string indexed=true stored=true /

  field name=size type=long indexed=true stored=true /
  field name=lastModified type=date indexed=true 
stored=true /


  field name=content type=text_en indexed=false 
stored=true multiValued=false/
  field name=text type=text_en indexed=true stored=false 
multiValued=true/


copyField source=content dest=text/

I copied all the jars from dist and contrib\* into server\solr\lib.

Stopping and restarting solr then creates a new managed-schema file and 
renames schema.xml to schema.xml.back


All good so far.

Now I go to the web admin for dataimport 
(http://localhost:8983/solr/#/hn2/dataimport//dataimport) and try and 
execute a full import.


But, the results show Requests: 0, Fetched: 58, Skipped: 0, 
Processed:1 - ie. it only adds one document (the very first one) even 
though it's iterated over 58!


No errors are reported in the logs.

I can search on the contents of that first epub document, so it's 
extracting OK in Tika, but there's a problem somewhere in my config 
that's causing only 1 document to be indexed in Solr.


Thanks for any assistance / pointers.

Regards,
Gary

--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor

Alex,

Thanks for the suggestions.  It always just indexes 1 doc, regardless of 
the first epub file it sees.  Debug / verbose don't show anything 
obvious to me.  I can include the output here if you think it would help.


I tried using the SimplePostTool first ( *java 
-Dtype=application/epub+zip 
-Durl=http://localhost:8983/solr/hn1/update/extract -jar post.jar 
\Users\gt\Documents\epub\*.epub) to index the docs and check the Tika 
parsing and that works OK so I don't think it's the e*pubs.


I was trying to use DIH so that I could more easily specify the schema 
fields and store content in the index in preparation for trying out the 
search highlighting. Couldn't work out how to do that with post.jar 


Thanks,
Gary

On 25/02/2015 17:09, Alexandre Rafalovitch wrote:

Try removing that first epub from the directory and rerunning. If you
now index 0 documents, then there is something unexpected about them
and DIH skips. If it indexes 1 document again but a different one,
then it is definitely something about the repeat logic.

Also, try running with debug and verbose modes and see if something
specific shows up.

Regards,
Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 25 February 2015 at 11:14, Gary Taylor g...@inovem.com wrote:

I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly
add a Solr document for each epub file in my local directory.

I've just downloaded Solr 5.0.0, on a Windows 7 PC.   I ran solr start and
then solr create -c hn2 to create a new core.

I want to index a load of epub files that I've got in a directory. So I
created a data-import.xml (in solr\hn2\conf):

dataConfig
 dataSource type=BinFileDataSource name=bin /
 document
 entity name=files dataSource=null rootEntity=false
 processor=FileListEntityProcessor
 baseDir=c:/Users/gt/Documents/epub fileName=.*epub
 onError=skip
 recursive=true
 field column=fileAbsolutePath name=id /
 field column=fileSize name=size /
 field column=fileLastModified name=lastModified /

 entity name=documentImport processor=TikaEntityProcessor
 url=${files.fileAbsolutePath} format=text
dataSource=bin onError=skip
 field column=file name=fileName/
 field column=Author name=author meta=true/
 field column=title name=title meta=true/
 field column=text name=content/
 /entity
 /entity
 /document
/dataConfig

In my solrconfig.xml, I added a requestHandler entry to reference my
data-import.xml:

   requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
   str name=configdata-import.xml/str
   /lst
   /requestHandler

I renamed managed-schema to schema.xml, and ensured the following doc fields
were setup:

   field name=id type=string indexed=true stored=true
required=true multiValued=false /
   field name=fileName type=string indexed=true stored=true /
   field name=author type=string indexed=true stored=true /
   field name=title type=string indexed=true stored=true /

   field name=size type=long indexed=true stored=true /
   field name=lastModified type=date indexed=true stored=true /

   field name=content type=text_en indexed=false stored=true
multiValued=false/
   field name=text type=text_en indexed=true stored=false
multiValued=true/

 copyField source=content dest=text/

I copied all the jars from dist and contrib\* into server\solr\lib.

Stopping and restarting solr then creates a new managed-schema file and
renames schema.xml to schema.xml.back

All good so far.

Now I go to the web admin for dataimport
(http://localhost:8983/solr/#/hn2/dataimport//dataimport) and try and
execute a full import.

But, the results show Requests: 0, Fetched: 58, Skipped: 0, Processed:1 -
ie. it only adds one document (the very first one) even though it's iterated
over 58!

No errors are reported in the logs.

I can search on the contents of that first epub document, so it's extracting
OK in Tika, but there's a problem somewhere in my config that's causing only
1 document to be indexed in Solr.

Thanks for any assistance / pointers.

Regards,
Gary

--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



--
Gary Taylor | www.inovem.com | www.kahootz.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE
kahootz.com is a trading name of INOVEM Ltd.



Re:

2013-07-23 Thread Gary Young
Can anyone remove this spammer please?


On Tue, Jul 23, 2013 at 4:47 AM, wired...@yahoo.com wrote:


 Hi!   http://mackieprice.org/cbs.com.network.html




Re: Is it possible to searh Solr with a longer query string?

2013-06-26 Thread Gary Young
Oh this is good!


On Wed, Jun 26, 2013 at 12:05 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/25/2013 6:15 PM, Jack Krupansky wrote:
  Are you using Tomcat?
 
  See:
  http://wiki.apache.org/solr/SolrTomcat#Enabling_Longer_Query_Requests
 
  Enabling Longer Query Requests
 
  If you try to submit too long a GET query to Solr, then Tomcat will
  reject your HTTP request on the grounds that the HTTP header is too
  large; symptoms may include an HTTP 400 Bad Request error or (if you
  execute the query in a web browser) a blank browser window.
 
  If you need to enable longer queries, you can set the maxHttpHeaderSize
  attribute on the HTTP Connector element in your server.xml file. The
  default value is 4K. (See
  http://tomcat.apache.org/tomcat-5.5-doc/config/http.html)

 Even better would be to force SolrJ to use a POST request.  In newer
 versions (4.1 and later) Solr sets the servlet container's POST buffer
 size and defaults it to 2MB.  In older versions, you'd have to adjust
 this in your servlet container config, but the default should be
 considerably larger than the header buffer used for GET requests.

 I thought that SolrJ used POST by default, but after looking at the
 code, it seems that I was wrong.  Here's how to send a POST query:

 response = server.query(query, METHOD.POST);

 The import required for this is:

 import org.apache.solr.client.solrj.SolrRequest.METHOD;

 Gary, if you can avoid it, you should not be creating a new
 HttpSolrServer object every time you make a query.  It is completely
 thread-safe, so create a singleton and use it for all queries against
 the medline core.

 Thanks,
 Shawn




Re: doc cache issues... query-time way to bypass cache?

2013-03-23 Thread Gary Yngve
Sigh, user error.

I missed this in the 4.1 release notes:

Collections that do not specify numShards at collection creation time use
custom sharding and default to the implicit router. Document updates
received by a shard will be indexed to that shard, unless a *shard*
parameter or document field names a different shard.


On Fri, Mar 22, 2013 at 3:39 PM, Gary Yngve gary.yn...@gmail.com wrote:

 I have a situation we just discovered in solr4.2 where there are
 previously cached results from a limited field list, and when querying for
 the whole field list, it responds differently depending on which shard gets
 the query (no extra replicas).  It either returns the document on the
 limited field list or the full field list.

 We're releasing tonight, so is there a query param to selectively bypass
 the cache, which I can use as a temp fix?

 Thanks,
 Gary



Re: overseer queue clogged

2013-03-22 Thread Gary Yngve
Thanks, Mark!

The core node names in the solr.xml in solr4.2 is great!  Maybe in 4.3 it
can be supported via API?

Also I am glad you mentioned in other post the chance to namespace
zookeeper by adding a path to the end of the comma-delim zk hosts.  That
works out really well in our situation for having zk serve multiple amazon
environments that go up and down independently of each other -- no issues
w/ shared clusterstate.json or overseers.

Regarding our original problem, we were able to restart all our shards but
one, which wasn't getting past
Mar 20, 2013 5:12:54 PM org.apache.solr.common.cloud.ZkStateReader$2 process
INFO: A cluster state change has occurred - updating...
Mar 20, 2013 5:12:54 PM org.apache.zookeeper.ClientCnxn$EventThread
processEvent
SEVERE: Error while calling watcher
java.lang.NullPointerException
at
org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:201)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

We ended up upgrading to solr4.2 and rebuilding the whole index from our
datastore.

-Gary


On Sat, Mar 16, 2013 at 9:51 AM, Mark Miller markrmil...@gmail.com wrote:

 Yeah, I don't know that I've ever tried with 4.0, but I've done this with
 4.1 and 4.2.

 - Mark

 On Mar 16, 2013, at 12:19 PM, Gary Yngve gary.yn...@gmail.com wrote:

  Cool, I'll need to try this.  I could have sworn that it didn't work that
  way in 4.0, but maybe my test was bunk.
 
  -g
 
 
  On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  You can do this - just modify your starting Solr example to have no
 cores
  in solr.xml. You won't be able to make use of the admin UI until you
 create
  at least one core, but the core and collection apis will both work fine.




doc cache issues... query-time way to bypass cache?

2013-03-22 Thread Gary Yngve
I have a situation we just discovered in solr4.2 where there are previously
cached results from a limited field list, and when querying for the whole
field list, it responds differently depending on which shard gets the query
(no extra replicas).  It either returns the document on the limited field
list or the full field list.

We're releasing tonight, so is there a query param to selectively bypass
the cache, which I can use as a temp fix?

Thanks,
Gary


Re: overseer queue clogged

2013-03-16 Thread Gary Yngve
Cool, I'll need to try this.  I could have sworn that it didn't work that
way in 4.0, but maybe my test was bunk.

-g


On Fri, Mar 15, 2013 at 9:41 PM, Mark Miller markrmil...@gmail.com wrote:

 You can do this - just modify your starting Solr example to have no cores
 in solr.xml. You won't be able to make use of the admin UI until you create
 at least one core, but the core and collection apis will both work fine.


Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
Sorry, should have specified.  4.1




On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller markrmil...@gmail.com wrote:

 What Solr version? 4.0, 4.1 4.2?

 - Mark

 On Mar 15, 2013, at 7:19 PM, Gary Yngve gary.yn...@gmail.com wrote:

  my solr cloud has been running fine for weeks, but about a week ago, it
  stopped dequeueing from the overseer queue, and now there are thousands
 of
  tasks on the queue, most which look like
 
  {
   operation:state,
   numShards:null,
   shard:shard3,
   roles:null,
   state:recovering,
   core:production_things_shard3_2,
   collection:production_things,
   node_name:10.31.41.59:8883_solr,
   base_url:http://10.31.41.59:8883/solr}
 
  i'm trying to create a new collection through collection API, and
  obviously, nothing is happening...
 
  any suggestion on how to fix this?  drop the queue in zk?
 
  how could did it have gotten in this state in the first place?
 
  thanks,
  gary




Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
Also, looking at overseer_elect, everything looks fine.  node is valid and
live.


On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve gary.yn...@gmail.com wrote:

 Sorry, should have specified.  4.1




 On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller markrmil...@gmail.comwrote:

 What Solr version? 4.0, 4.1 4.2?

 - Mark

 On Mar 15, 2013, at 7:19 PM, Gary Yngve gary.yn...@gmail.com wrote:

  my solr cloud has been running fine for weeks, but about a week ago, it
  stopped dequeueing from the overseer queue, and now there are thousands
 of
  tasks on the queue, most which look like
 
  {
   operation:state,
   numShards:null,
   shard:shard3,
   roles:null,
   state:recovering,
   core:production_things_shard3_2,
   collection:production_things,
   node_name:10.31.41.59:8883_solr,
   base_url:http://10.31.41.59:8883/solr}
 
  i'm trying to create a new collection through collection API, and
  obviously, nothing is happening...
 
  any suggestion on how to fix this?  drop the queue in zk?
 
  how could did it have gotten in this state in the first place?
 
  thanks,
  gary





Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
I restarted the overseer node and another took over, queues are empty now.

the server with core production_things_shard1_2
is having these errors:

shard update error RetryNode:
http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at:
http://10.104.59.189:8883/solr/production_things_shard11_replica1

  for shard11!!!

I also got some strange errors on the restarted node.  Makes me wonder if
there is a string-matching bug for shard1 vs shard11?

SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
  at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
  at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
  at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader
of shard: shard1 our state
says:http://10.104.59.189:8883/solr/collection1/but zookeeper
says:http
://10.217.55.151:8883/solr/collection1/
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)

INFO: Releasing
directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
d11_replica1/data/index
Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)

SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state
recovering for 10.76.31.
67:8883_solr but I still do not see the requested state. I see state:
active live:true
  at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
.java:948)




On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller markrmil...@gmail.com wrote:

 Strange - we hardened that loop in 4.1 - so I'm not sure what happened
 here.

 Can you do a stack dump on the overseer and see if you see an Overseer
 thread running perhaps? Or just post the results?

 To recover, you should be able to just restart the Overseer node and have
 someone else take over - they should pick up processing the queue.

 Any logs you might be able to share could be useful too.

 - Mark

 On Mar 15, 2013, at 7:51 PM, Gary Yngve gary.yn...@gmail.com wrote:

  Also, looking at overseer_elect, everything looks fine.  node is valid
 and
  live.
 
 
  On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve gary.yn...@gmail.com
 wrote:
 
  Sorry, should have specified.  4.1
 
 
 
 
  On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  What Solr version? 4.0, 4.1 4.2?
 
  - Mark
 
  On Mar 15, 2013, at 7:19 PM, Gary Yngve gary.yn...@gmail.com wrote:
 
  my solr cloud has been running fine for weeks, but about a week ago,
 it
  stopped dequeueing from the overseer queue, and now there are
 thousands
  of
  tasks on the queue, most which look like
 
  {
  operation:state,
  numShards:null,
  shard:shard3,
  roles:null,
  state:recovering,
  core:production_things_shard3_2,
  collection:production_things,
  node_name:10.31.41.59:8883_solr,
  base_url:http://10.31.41.59:8883/solr}
 
  i'm trying to create a new collection through collection API, and
  obviously, nothing is happening...
 
  any suggestion on how to fix this?  drop the queue in zk?
 
  how could did it have gotten in this state in the first place?
 
  thanks,
  gary
 
 
 




Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers
are red now in the solr cloud graph.. trying to figure out what that
means...


On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve gary.yn...@gmail.com wrote:

 I restarted the overseer node and another took over, queues are empty now.

 the server with core production_things_shard1_2
 is having these errors:

 shard update error RetryNode:
 http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
 Server refused connection at:
 http://10.104.59.189:8883/solr/production_things_shard11_replica1

   for shard11!!!

 I also got some strange errors on the restarted node.  Makes me wonder if
 there is a string-matching bug for shard1 vs shard11?

 SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk
   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
   at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
   at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
   at
 org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
   at
 org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
   at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.common.SolrException: There is conflicting
 information about the leader
 of shard: shard1 our state says:
 http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http
 ://10.217.55.151:8883/solr/collection1/
   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)

 INFO: Releasing
 directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
 d11_replica1/data/index
 Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)

 SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state
 recovering for 10.76.31.
 67:8883_solr but I still do not see the requested state. I see state:
 active live:true
   at
 org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
 .java:948)




 On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller markrmil...@gmail.comwrote:

 Strange - we hardened that loop in 4.1 - so I'm not sure what happened
 here.

 Can you do a stack dump on the overseer and see if you see an Overseer
 thread running perhaps? Or just post the results?

 To recover, you should be able to just restart the Overseer node and have
 someone else take over - they should pick up processing the queue.

 Any logs you might be able to share could be useful too.

 - Mark

 On Mar 15, 2013, at 7:51 PM, Gary Yngve gary.yn...@gmail.com wrote:

  Also, looking at overseer_elect, everything looks fine.  node is valid
 and
  live.
 
 
  On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve gary.yn...@gmail.com
 wrote:
 
  Sorry, should have specified.  4.1
 
 
 
 
  On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  What Solr version? 4.0, 4.1 4.2?
 
  - Mark
 
  On Mar 15, 2013, at 7:19 PM, Gary Yngve gary.yn...@gmail.com wrote:
 
  my solr cloud has been running fine for weeks, but about a week ago,
 it
  stopped dequeueing from the overseer queue, and now there are
 thousands
  of
  tasks on the queue, most which look like
 
  {
  operation:state,
  numShards:null,
  shard:shard3,
  roles:null,
  state:recovering,
  core:production_things_shard3_2,
  collection:production_things,
  node_name:10.31.41.59:8883_solr,
  base_url:http://10.31.41.59:8883/solr}
 
  i'm trying to create a new collection through collection API, and
  obviously, nothing is happening...
 
  any suggestion on how to fix this?  drop the queue in zk?
 
  how could did it have gotten in this state in the first place?
 
  thanks,
  gary
 
 
 





Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
i think those followers are red from trying to forward requests to the
overseer while it was being restarted.  i guess i'll see if they become
green over time.  or i guess i can restart them one at a time..


On Fri, Mar 15, 2013 at 6:53 PM, Gary Yngve gary.yn...@gmail.com wrote:

 it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers
 are red now in the solr cloud graph.. trying to figure out what that
 means...


 On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve gary.yn...@gmail.com wrote:

 I restarted the overseer node and another took over, queues are empty now.

 the server with core production_things_shard1_2
 is having these errors:

 shard update error RetryNode:
 http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
 Server refused connection at:
 http://10.104.59.189:8883/solr/production_things_shard11_replica1

   for shard11!!!

 I also got some strange errors on the restarted node.  Makes me wonder if
 there is a string-matching bug for shard1 vs shard11?

 SEVERE: :org.apache.solr.common.SolrException: Error getting leader from
 zk
   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
   at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
   at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
   at
 org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
   at
 org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
   at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.common.SolrException: There is conflicting
 information about the leader
 of shard: shard1 our state says:
 http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http
 ://10.217.55.151:8883/solr/collection1/
   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)

 INFO: Releasing
 directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
 d11_replica1/data/index
 Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)

 SEVERE: org.apache.solr.common.SolrException: I was asked to wait on
 state recovering for 10.76.31.
 67:8883_solr but I still do not see the requested state. I see state:
 active live:true
   at
 org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
 .java:948)




 On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller markrmil...@gmail.comwrote:

 Strange - we hardened that loop in 4.1 - so I'm not sure what happened
 here.

 Can you do a stack dump on the overseer and see if you see an Overseer
 thread running perhaps? Or just post the results?

 To recover, you should be able to just restart the Overseer node and
 have someone else take over - they should pick up processing the queue.

 Any logs you might be able to share could be useful too.

 - Mark

 On Mar 15, 2013, at 7:51 PM, Gary Yngve gary.yn...@gmail.com wrote:

  Also, looking at overseer_elect, everything looks fine.  node is valid
 and
  live.
 
 
  On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve gary.yn...@gmail.com
 wrote:
 
  Sorry, should have specified.  4.1
 
 
 
 
  On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  What Solr version? 4.0, 4.1 4.2?
 
  - Mark
 
  On Mar 15, 2013, at 7:19 PM, Gary Yngve gary.yn...@gmail.com
 wrote:
 
  my solr cloud has been running fine for weeks, but about a week
 ago, it
  stopped dequeueing from the overseer queue, and now there are
 thousands
  of
  tasks on the queue, most which look like
 
  {
  operation:state,
  numShards:null,
  shard:shard3,
  roles:null,
  state:recovering,
  core:production_things_shard3_2,
  collection:production_things,
  node_name:10.31.41.59:8883_solr,
  base_url:http://10.31.41.59:8883/solr}
 
  i'm trying to create a new collection through collection API, and
  obviously, nothing is happening...
 
  any suggestion on how to fix this?  drop the queue in zk?
 
  how could did it have gotten in this state in the first place?
 
  thanks,
  gary
 
 
 






Re: overseer queue clogged

2013-03-15 Thread Gary Yngve
I will upgrade to 4.2 this weekend and see what happens.  We are on ec2 and
have had a few issues with hostnames with both zk and solr. (but in this
case i haven't rebooted any instances either)

it's relatively a pain to do the upgrade because we have a query/scorer
fork of lucene along with supplemental jars, and zk cannot distribute
binary jars via the config.

we are also multi-collection per zk... i wish it didn't require a core
always defined up front for the core admin?  i would love to have an
instance have no cores and then just create the core i need..

-g



On Fri, Mar 15, 2013 at 7:14 PM, Mark Miller markrmil...@gmail.com wrote:


 On Mar 15, 2013, at 10:04 PM, Gary Yngve gary.yn...@gmail.com wrote:

  i think those followers are red from trying to forward requests to the
  overseer while it was being restarted.  i guess i'll see if they become
  green over time.  or i guess i can restart them one at a time..

 Restarting the cluster clear things up. It shouldn't take too long for
 those nodes to recover though - they should have been up to date before.
 The couple exceptions you posted def indicate something is out of whack.
 It's something I'd like to get to the bottom of.

 - Mark

 
 
  On Fri, Mar 15, 2013 at 6:53 PM, Gary Yngve gary.yn...@gmail.com
 wrote:
 
  it doesn't appear to be a shard1 vs shard11 issue... 60% of my followers
  are red now in the solr cloud graph.. trying to figure out what that
  means...
 
 
  On Fri, Mar 15, 2013 at 6:48 PM, Gary Yngve gary.yn...@gmail.com
 wrote:
 
  I restarted the overseer node and another took over, queues are empty
 now.
 
  the server with core production_things_shard1_2
  is having these errors:
 
  shard update error RetryNode:
 
 http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException
 :
  Server refused connection at:
  http://10.104.59.189:8883/solr/production_things_shard11_replica1
 
   for shard11!!!
 
  I also got some strange errors on the restarted node.  Makes me wonder
 if
  there is a string-matching bug for shard1 vs shard11?
 
  SEVERE: :org.apache.solr.common.SolrException: Error getting leader
 from
  zk
   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
   at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
   at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
   at
  org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
   at
  org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
   at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
  Caused by: org.apache.solr.common.SolrException: There is conflicting
  information about the leader
  of shard: shard1 our state says:
  http://10.104.59.189:8883/solr/collection1/ but zookeeper says:http
  ://10.217.55.151:8883/solr/collection1/
   at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)
 
  INFO: Releasing
 
 directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
  d11_replica1/data/index
  Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.solr.common.SolrException: Error opening new
 searcher
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)
 
  SEVERE: org.apache.solr.common.SolrException: I was asked to wait on
  state recovering for 10.76.31.
  67:8883_solr but I still do not see the requested state. I see state:
  active live:true
   at
 
 org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
  .java:948)
 
 
 
 
  On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Strange - we hardened that loop in 4.1 - so I'm not sure what happened
  here.
 
  Can you do a stack dump on the overseer and see if you see an Overseer
  thread running perhaps? Or just post the results?
 
  To recover, you should be able to just restart the Overseer node and
  have someone else take over - they should pick up processing the
 queue.
 
  Any logs you might be able to share could be useful too.
 
  - Mark
 
  On Mar 15, 2013, at 7:51 PM, Gary Yngve gary.yn...@gmail.com wrote

Re: How to use shardId

2013-02-20 Thread Gary Yngve
the param in solr.xml should be shard, not shardId.  i tripped over this
too.

-g



On Mon, Jan 14, 2013 at 7:01 AM, starbuck thomas.ma...@fiz-karlsruhe.dewrote:

 Hi all,

 I am trying to realize a solr cloud cluster with 2 collections and 4 shards
 each with 2 replicates hosted by 4 solr instances. If shardNum parm is set
 to 4 and all solr instances are started after each other it seems to work
 fine.

 What I wanted to do now is removing shardNum from JAVA_OPTS and defining
 each core with a shardId. Here is my current solr.xml of the first and
 second (in the second there is another instanceDir, the rest is the same)
 solr instance:



 Here is solr.xml of the third and fourth solr instance:



 But it seems that solr doesn't accept the shardId or omits it. What I
 really
 get is 2 collections each with 2 shards and 8 replicates (each solr
 instance
 2)
 Either the functionality is not really clear to me or there has to be a
 config failure.

 It would very helpful if anyone could give me a hint.

 Thanks.
 starbuck





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-use-shardId-tp4033186.html
 Sent from the Solr - User mailing list archive at Nabble.com.



solr4.1 createNodeSet requires ip addresses?

2013-02-15 Thread Gary Yngve
Hi all,

I've been unable to get the collections create API to work with
createNodeSet containing hostnames, both localhost and external hostnames.
 I've only been able to get it working when using explicit IP addresses.

It looks like zk stores the IP addresses in the clusterstate.json and
live_nodes.  Is it possible that Solr Cloud is not doing any hostname
resolving but just looking for an explicit match with createNodeSet?  This
is kind of annoying, in that I am working with EC2 instances and consider
it pretty lame to need to use elastic IPs for internal use.  I'm hacking
around it now (looking up the eth0 inet addr on each machine), but I'm not
happy about it.

Has anyone else found a better solution?

The reason I want to specify explicit nodes for collections is so I can
have just one zk ensemble managing collections across different
environments that will go up and down independently of each other.

Thanks,
Gary


Re: incorrect solr update behavior

2013-01-14 Thread Gary Yngve
Of course, as soon as I post this, I discover this:

https://issues.apache.org/jira/browse/SOLR-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537900#comment-13538174

i'll give this patch a spin in the morning.

(this is not an example of how to use antecedents :))

-g


On Mon, Jan 14, 2013 at 6:27 PM, Gary Yngve gary.yn...@gmail.com wrote:

 Posting this

 ?xml version=1.0 encoding=UTF-8?adddocfield name=nickname_s
 update=setblah/fieldfield name=tags_ss
 update=addqux/fieldfield name=tags_ss
 update=addquux/fieldfield name=idfoo/field/doc/add

 to an existing doc with foo and bar tags
 results in tags_ss containing
 arr name=tags_ss
 str{add=qux}/str
 str{add=quux}/str
 /arr

 whereas posting this

 ?xml version=1.0 encoding=UTF-8?adddocfield name=nickname_s
 update=setblah/fieldfield name=tags_ss
 update=addqux/fieldfield name=idfoo/field/doc/add

 results in the expected behavior:
 arr name=tags_ss
 strfoo/str
 strbar/str
 strqux/str
 /arr

 Any ideas?

 Thanks,
 Gary



RE: dih groovy script question

2012-09-21 Thread Moore, Gary
Looks like some sort of foul-up with Groovy versions and Solr 3.6.1 as  I had 
to roll back to Groovy 1.7.10 to get this to work.   Started with Groovy 2 and 
then 1.8 before 1.7.10.   What's odd is that I implemented the same calls made 
in ScriptTransformer.java in a test program and they worked fine with all 
Groovy versions.  Can't imagine what the root cause might be -- Groovy 
implements jsr223 differently in later versions?  I suppose to find out I could 
compile Solr with my jdk but  time to march on. ;)
Gary

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, September 15, 2012 9:01 AM
To: solr-user@lucene.apache.org
Subject: Re: dih groovy script question

Stab in the dark... This looks like you're somehow getting the wrong Groovy 
jars. Can you print out the Groovy version as a test? Perhaps you have one 
groovy version in your command-line and copied a different version into the 
libraries Solr knows about?

Because this looks like a pure Groovy error

Best
Erick

On Thu, Sep 13, 2012 at 9:03 PM, Moore, Gary gary.mo...@ars.usda.gov wrote:
 I'm a bit stumped as to why I can't get a groovy script to run from the DIH.  
  I'm sure it's something braindead I'm missing.   The script looks like this 
 in data-config.xml:

 script language=groovy![CDATA[
 import java.security.MessageDigest
 import java.util.HashMap
 def createHashId(HashMapString,Objectrow, 
 org.apache.solr.handler.dataimport.ContextImpl context )  {
   // do groovy stuff
 return row } ]] /script

 When I run the import, I get the following error:


 Caused by: java.lang.NoSuchMethodException: No signature of method: 
 org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is 
 applicable for argument types: (java.util.HashMap, 
 org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, 
 Credits:, EnteredBy:Corey Holland, ...], ...]
 at 
 org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364)
 at 
 org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160)
 ... 13 more

 The script runs fine from the shell so I don't believe there are any groovy 
 errors.  Thanks in advance for any tips.
 Gary




 This electronic message contains information generated by the USDA solely for 
 the intended recipients. Any unauthorized interception of this message or the 
 use or disclosure of the information it contains may violate the law and 
 subject the violator to civil or criminal penalties. If you believe you have 
 received this message in error, please notify the sender and delete the email 
 immediately.




dih groovy script question

2012-09-13 Thread Moore, Gary
I'm a bit stumped as to why I can't get a groovy script to run from the DIH.   
I'm sure it's something braindead I'm missing.   The script looks like this in 
data-config.xml:

script language=groovy![CDATA[
import java.security.MessageDigest
import java.util.HashMap
def createHashId(HashMapString,Objectrow, 
org.apache.solr.handler.dataimport.ContextImpl context )  {
  // do groovy stuff
return row
}
]]
/script

When I run the import, I get the following error:


Caused by: java.lang.NoSuchMethodException: No signature of method: 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.createHashId() is applicable 
for argument types: (java.util.HashMap, 
org.apache.solr.handler.dataimport.ContextImpl) values: [[Format:Reports, 
Credits:, EnteredBy:Corey Holland, ...], ...]
at 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeImpl(GroovyScriptEngineImpl.java:364)
at 
org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.invokeFunction(GroovyScriptEngineImpl.java:160)
... 13 more

The script runs fine from the shell so I don't believe there are any groovy 
errors.  Thanks in advance for any tips.
Gary




This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.


Payloads slowing down add/delete doc

2012-03-02 Thread Gary Yang
Hi, there

In order to keep a DocID vs UID map, we added payload to a solr core. The 
search on UID is very fast but we get a problem with adding/deleting docs.  
Every time we commit an adding/deleting action, solr/lucene will take up to 30 
seconds to complete.  Without payload, a same action can be done in 
milliseconds.

We do need real time commit.

Here is the payload definition:

fieldType name=payloadfld class=solr.TextField
analyzer
tokenizer 
class=org.apache.solr.analysis.KeywordTokenizerFactory/
filter 
class=org.apache.solr.analysis.DelimitedPayloadTokenFilterFactory 
encoder=integer/
  /analyzer
/fieldType

   field name=uid type=payloadfld indexed=true stored=false 
required=true/


Any suggestions?

Any help is appreciated.

Best Regards

G. Y.


DIH doesn't handle bound namespaces?

2011-10-31 Thread Moore, Gary
I'm trying to import some MODS XML using DIH.  The XML uses bound namespacing:

mods xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
  xmlns:mods=http://www.loc.gov/mods/v3;
  xmlns:xlink=http://www.w3.org/1999/xlink;
  xmlns=http://www.loc.gov/mods/v3;
  xsi:schemaLocation=http://www.loc.gov/mods/v3 
http://www.loc.gov/mods/v3/mods-3-4.xsd;
  version=3.4
   mods:titleInfo
  mods:titleMalus domestica: Arnold/mods:title
   /mods:titleInfo
/mods

However, XPathEntityProcessor doesn't seem to handle xpaths of the type 
xpath=//mods:titleInfo/mods:title.

If I remove the namespaces from the source XML:

mods xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
  xmlns:mods=http://www.loc.gov/mods/v3;
  xmlns:xlink=http://www.w3.org/1999/xlink;
  xmlns=http://www.loc.gov/mods/v3;
  xsi:schemaLocation=http://www.loc.gov/mods/v3 
http://www.loc.gov/mods/v3/mods-3-4.xsd;
  version=3.4
   titleInfo
  titleMalus domestica: Arnold/title
   /titleInfo
/mods

then xpath=//titleInfo/title works just fine.  Can anyone confirm that this 
is the case and, if so, recommend a solution?
Thanks
Gary


Gary Moore
Technical Lead
LCA Digital Commons Project
NAL/ARS/USDA



query for point in time

2011-09-15 Thread gary tam
Hi

I have a scenario that I am not sure how to write the query for.

Here is the scenario - have an employee record with multi value for project,
started date, end date.

looks something like


John Smith web site bug fix   2010-01-01   2010-01-03
 unit testing  2010-01-04
2010-01-06
 QA support 2010-01-07
2010-01-12
 implementation   2010-01-13
 2010-01-22

I want to find what project John Smith was working on 2010-01-05

Is this possible or I have to back to my database ?


Thanks


Re: query for point in time

2011-09-15 Thread gary tam
Thanks for the reply.  We had the search within the database initially, but
it proven to be too slow.  With solr we have much better performance.

One more question, how could I find the most current job for each employee

My data looks like


John Smith  department A   web site bug fix   2010-01-01
2010-01-03
 unit testing
 2010-01-04   2010-01-06
 QA support
2010-01-07   2010-01-12
 implementation   2010-01-13
   2010-01-22

Jane Doe  department A  QA support 2010-01-01
2010-05-01
 implementation   2010-05-02
   2010-09-28

Joe Doe  department APHP development  2011-01-01
2011-08-31
 Java Development  2011-09-01
2011-09-15

I would like to return this as my search result

John Smith   department Aimplementation  2010-01-13
  2010-01-22
Jane Doe  department Aimplementation  2010-05-02
  2010-09-28
Joe Doedepartment AJava Development  2011-09-01
  2011-09-15


Thanks in advance
Gary



On Thu, Sep 15, 2011 at 3:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 You didn't tell us what your schema looks like, what fields with what types
 are involved.

 But similar to how you'd do it in your database, you need to find
 'documents' that have a start date before your date in question, and an end
 date after your date in question, to find the ones whose range includes your
 date in question.

 Something like this:

 q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *]

 Of course, you need to add on your restriction to just documents about
 'John Smith', through another AND clause or an 'fq'.

 But in general, if you've got a db with this info already, and this is all
 you need, why not just use the db?  Multi-hieararchy data like this is going
 to give you trouble in Solr eventually, you've got to arrange the solr
 indexes/schema to answer your questions, and eventually you're going to have
 two questions which require mutually incompatible schema to answer.

 An rdbms is a great general purpose question answering tool for structured
 data.  lucene/Solr is a great indexing tool for text matching.


 On 9/15/2011 2:55 PM, gary tam wrote:

 Hi

 I have a scenario that I am not sure how to write the query for.

 Here is the scenario - have an employee record with multi value for
 project,
 started date, end date.

 looks something like


 John Smith web site bug fix   2010-01-01   2010-01-03
  unit testing  2010-01-04
 2010-01-06
  QA support 2010-01-07
 2010-01-12
  implementation   2010-01-13
  2010-01-22

 I want to find what project John Smith was working on 2010-01-05

 Is this possible or I have to back to my database ?


 Thanks




RE: how to run solr in apache server?

2011-09-07 Thread Moore, Gary
Solr only runs in a container.  To make it appear as if Solr is running on 
httpd,  Google 'httpd tomcat' for instructions on how to front tomcat with 
httpd mod_jk or mod_proxy.  Our system admins prefer mod_proxy.  Not sure why 
you'd need to front Solr with httpd since it's usually an application backend, 
e.g. a PHP application running on port 80 connects to Solr on port 8983.
Gary

-Original Message-
From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] 
Sent: Wednesday, September 07, 2011 7:41 AM
To: solr-user@lucene.apache.org
Subject: how to run solr in apache server?

Hi everybody...
 can anybody tell me how to run solr on Apache server(not apache
tomcat)


Thnax in advance

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-run-solr-in-apache-server-tp3316377p3316377.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: commas in synonyms.txt are not escaping

2011-08-29 Thread Moore, Gary
Hah, I knew it was something simple. :)  Thanks.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, August 28, 2011 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Turns out this isn't a bug - I was just tripped up by the analysis
changes to the example server.

Gary, you are probably just hitting the same thing.
The text fieldType is no longer used by any fields by default - for
example the text field uses the text_general fieldType.
This fieldType uses the standard tokenizer, which discards stuff like
commas (hence the synonym will never match).

-Yonik
http://www.lucidimagination.com


commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary

I have a number of chemical names containing commas which I'm mapping in 
index_synonyms.txt thusly:

2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
3,CCRIS 8562

According to the sample synonyms.txt, the comma above should be. i.e. 
a\,a=b\,b.The problem is that according to analysis.jsp the commas are not 
being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I paste in 
2\,4-D-butotyl, the mappings are done.  This is verified by there being no 
mappings in the index.  I assume there would be if 2\,4-D-butotyl actually 
appeared in a document.

The filter I'm declaring in the index analyzer looks like this:

filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt  
tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true 
expand=true/

Doesn't seem to matter which tokenizer I use.This must be something simple 
that I'm not doing but am a bit stumped at the moment and would appreciate any 
tips.
Thanks
Gary




RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Here you go -- I'm just hacking the text field at the moment.  Thanks,
Gary

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true 
expand=true/
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
   !--filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true tokenizerFactory=solr.KeywordTokenizerFactory 
expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, please post the entire field declaration so I can try to reproduce
here




RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Thanks, Yonik.
Gary

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Friday, August 26, 2011 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

On Fri, Aug 26, 2011 at 11:16 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Moore, Gary gary.mo...@ars.usda.gov wrote:

 I have a number of chemical names containing commas which I'm mapping in 
 index_synonyms.txt thusly:

 2\,4-D-butotyl=Aqua-Kleen,BRN 1996617,Bladex-B,Brush killer 64,Butoxy-D 
 3,CCRIS 8562

 According to the sample synonyms.txt, the comma above should be. i.e. 
 a\,a=b\,b.    The problem is that according to analysis.jsp the commas are 
 not being escaped.  If I paste in 2,4-D-butotyl, then no mappings.  If I 
 paste in 2\,4-D-butotyl, the mappings are done.


 I can confirm that this works in 1.4, but no longer works in 3x or
 trunk.  Can you open an issue?

Actually, I think I've tracked it to LUCENE-3233 where the parsing
rules were moved from Solr to Lucene (and changed the functionality in
the process).
I'll reopen t hat since I don't think it's been in a released version yet.

-Yonik
http://www.lucidimagination.com


RE: commas in synonyms.txt are not escaping

2011-08-26 Thread Moore, Gary
Alexi,
Yes but no difference.  This is apparently an issue introduced in 3.*.  Thanks 
for your help.
-Gary

-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Friday, August 26, 2011 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: commas in synonyms.txt are not escaping

Gary, isn't your wordDelimiter removing your commas in the query time? have
u tried it in the analyzer?

2011/8/26 Moore, Gary gary.mo...@ars.usda.gov

 Here you go -- I'm just hacking the text field at the moment.  Thanks,
 Gary

 fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt
 tokenizerFactory=solr.KeywordTokenizerFactory ignoreCase=true
 expand=true/
 !-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
   !--filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true tokenizerFactory=solr.KeywordTokenizerFactory
 expand=true/--
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

 -Original Message-
 From: Alexei Martchenko [mailto:ale...@superdownloads.com.br]
 Sent: Friday, August 26, 2011 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: commas in synonyms.txt are not escaping

 Gary, please post the entire field declaration so I can try to reproduce
 here





-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor

Naveen,

Not sure our requirement matches yours, but one of the things we index 
is a comment item that can have one or more files attached to it.  To 
index the whole thing as a single Solr document we create a zipfile 
containing a file with the comment details in it and any additional 
attached files.  This is submitted to Solr as a TEXT field in an XML 
doc, along with other meta-data fields from the comment.  In our schema 
the TEXT field is indexed but not stored, so when we search and get a 
match back it doesn't contain all of the contents from the attached 
files etc., only the stored fields in our schema.   Admittedly, the user 
can therefore get back a comment match with no indication as to WHERE 
the match occurred (ie. was it in the meta-data or the contents of the 
attached files), but at the moment we're only interested in getting 
appropriate matches, not explaining where the match is.


Hope that helps.

Kind regards,
Gary.



On 09/06/2011 03:00, Naveen Gupta wrote:

Hi Gary

It started working .. though i did not test for Zip files, but for rar
files, it is working fine ..

only thing what i wanted to do is to index the metadata (text mapped to
content) not store the data  Also in search result, i want to filter the
stuffs ... and it started working fine .. i don't want to show the content
stuffs to the end user, since the way it extracts the information is not
very helpful to the user .. although we can apply few of the analyzers and
filters to remove the unnecessary tags ..still the information would not be
of much help .. looking for your opinion ... what you did in order to filter
out the content or are you showing the content extracted to the end user?

Even in case, we are showing the text part to the end user, how can i limit
the number of characters while querying the search results ... is there any
feature where we can achieve this ... the concept of snippet kind of thing
...

Thanks
Naveen

On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylorg...@inovem.com  wrote:


Naveen,

For indexing Zip files with Tika, take a look at the following thread :


http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

I got it to work with the 3.1 source and a couple of patches.

Hope this helps.

Regards,
Gary.



On 08/06/2011 04:12, Naveen Gupta wrote:


Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.





Re: tika integration exception and other related queries

2011-06-08 Thread Gary Taylor

Naveen,

For indexing Zip files with Tika, take a look at the following thread :

http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html

I got it to work with the 3.1 source and a couple of patches.

Hope this helps.

Regards,
Gary.


On 08/06/2011 04:12, Naveen Gupta wrote:

Hi Can somebody answer this ...

3. can somebody tell me an idea how to do indexing for a zip file ?

1. while sending docx, we are getting following error.




Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-23 Thread Gary Taylor

Jayendra,

I cleared out my local repository, and replayed all of my steps from 
Friday and it now it works.  The only difference (or the only one that's 
obvious to me) was that I applied the patch before doing a full 
compile/test/dist.  But I assumed that given I was seeing my new log 
entries (from ExtractingDocumentLoader.java) I was running the correct 
code anyway.


However, I'm very pleased that it's working now - I get the full 
contents of the zipped files indexed and not just the file names.


Thank you again for your assistance, and the patch!

Kind regards,
Gary.


On 21/05/2011 03:12, Jayendra Patil wrote:

Hi Gary,

I tried the patch on the the 3.1 source code (@
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/)
as well and it worked fine.
@Patch - https://issues.apache.org/jira/browse/SOLR-2416, which deals
with the Solr Cell module.

You may want to verify the contents from the results by enabling the
stored attribute on the text field.

e.g. URL curl 
http://localhost:8983/solr/update/extract?stream.file=C:/Test.zipliteral.id=777045literal.title=Testcommit=true;

Let me know if it works. I would be happy to share the generated
artifact you can test on.

Regards,
Jayendra




Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-20 Thread Gary Taylor
Hello again.  Unfortunately, I'm still getting nowhere with this.  I 
have checked-out the 3.1 source and applied Jayendra's patches (see 
below) and it still appears that the contents of the files in the 
zipfile are not being indexed, only the filenames of those contained files.


I'm using a simple CURL invocation to test this:

curl 
http://localhost:8983/solr/core0/update/extract?literal.docid=74fmap.content=textliteral.type=5; 
-F commit=true -F file=@solr1.zip


solr1.zip contains two simple txt files (doc1.txt and doc2.txt).  I'm 
expecting the contents of those txt files to be extracted from the zip 
and indexed, but this isn't happening - or at least, I don't get the 
desired result when I do a query afterwards.  I do get a match if I 
search for either doc1.txt or doc2.txt, but not if I search for a 
word that appears in their contents.


If I index one of the txt files (instead of the zipfile), I can query 
the content OK, so I'm assuming my query is sensible and matches the 
field specified on the CURL string (ie. text).  I'm also happy that 
the Solr Cell content extraction is working because I can successfully 
index PDF, Word, etc. files.


In a fit of desperation I have added log.info statements into the files 
referenced by Jayendra's patches (SOLR-2416 and SOLR-2332) and I see 
those in the log when I submit the zipfile with CURL, so I know I'm 
running those patched files in the build.


If anyone can shed any light on what's happening here, I'd be very grateful.

Thanks and kind regards,
Gary.


On 11/04/2011 11:12, Gary Taylor wrote:

Jayendra,

Thanks for the info - been keeping an eye on this list in case this 
topic cropped up again.  It's currently a background task for me, so 
I'll try and take a look at the patches and re-test soon.


Joey - glad you brought this issue up again.  I haven't progressed any 
further with it.  I've not yet moved to Solr 3.1 but it's on my to-do 
list, as is testing out the patches referenced by Jayendra.  I'll post 
my findings on this thread - if you manage to test the patches before 
me, let me know how you get on.


Thanks and kind regards,
Gary.


On 11/04/2011 05:02, Jayendra Patil wrote:

The migration of Tika to the latest 0.8 version seems to have
reintroduced the issue.

I was able to get this working again with the following patches. (Solr
Cell and Data Import handler)

https://issues.apache.org/jira/browse/SOLR-2416
https://issues.apache.org/jira/browse/SOLR-2332

You can try these.

Regards,
Jayendra

On Sun, Apr 10, 2011 at 10:35 PM, Joey 
Hanzelphan...@nearinfinity.com  wrote:

Hi Gary,

I have been experiencing the same problem... Unable to extract 
content from
archive file formats.  I just tried again with a clean install of 
Solr 3.1.0
(using Tika 0.8) and continue to experience the same results.  Did 
you have

any success with this problem with Solr 1.4.1 or 3.1.0 ?

I'm using this curl command to send data to Solr.
curl 
http://localhost:8080/solr/update/extract?literal.id=doc1fmap.content=attr_contentcommit=true; 


-H application/octet-stream -F  myfile=@data.zip

No problem extracting single rich text documents, but archive files 
only

result in the file names within the archive being indexed. Am I missing
something else in my configuration? Solr doesn't seem to be 
unpacking the
archive files. Based on the email chain associated with your first 
message,
some people have been able to get this functionality to work as 
desired.










--
Gary Taylor
INOVEM

Tel +44 (0)1488 648 480
Fax +44 (0)7092 115 933
gary.tay...@inovem.com
www.inovem.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE



Seattle Solr/Lucene User Group?

2011-04-13 Thread Gary Yngve
Hi all,

Does anyone know if there is a Solr/Lucene user group /
birds-of-feather that meets in Seattle?

If not, I'd like to start one up.  I'd love to learn and share tricks
pertaining to NRT, performance, distributed solr, etc.

Also, I am planning on attending the Lucene Revolution!

Let's connect!

-Gary

http://www.linkedin.com/in/garyyngve


Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Gary Taylor

Jayendra,

Thanks for the info - been keeping an eye on this list in case this 
topic cropped up again.  It's currently a background task for me, so 
I'll try and take a look at the patches and re-test soon.


Joey - glad you brought this issue up again.  I haven't progressed any 
further with it.  I've not yet moved to Solr 3.1 but it's on my to-do 
list, as is testing out the patches referenced by Jayendra.  I'll post 
my findings on this thread - if you manage to test the patches before 
me, let me know how you get on.


Thanks and kind regards,
Gary.


On 11/04/2011 05:02, Jayendra Patil wrote:

The migration of Tika to the latest 0.8 version seems to have
reintroduced the issue.

I was able to get this working again with the following patches. (Solr
Cell and Data Import handler)

https://issues.apache.org/jira/browse/SOLR-2416
https://issues.apache.org/jira/browse/SOLR-2332

You can try these.

Regards,
Jayendra

On Sun, Apr 10, 2011 at 10:35 PM, Joey Hanzelphan...@nearinfinity.com  wrote:

Hi Gary,

I have been experiencing the same problem... Unable to extract content from
archive file formats.  I just tried again with a clean install of Solr 3.1.0
(using Tika 0.8) and continue to experience the same results.  Did you have
any success with this problem with Solr 1.4.1 or 3.1.0 ?

I'm using this curl command to send data to Solr.
curl 
http://localhost:8080/solr/update/extract?literal.id=doc1fmap.content=attr_contentcommit=true;
-H application/octet-stream -F  myfile=@data.zip

No problem extracting single rich text documents, but archive files only
result in the file names within the archive being indexed. Am I missing
something else in my configuration? Solr doesn't seem to be unpacking the
archive files. Based on the email chain associated with your first message,
some people have been able to get this functionality to work as desired.






--
Gary Taylor
INOVEM

Tel +44 (0)1488 648 480
Fax +44 (0)7092 115 933
gary.tay...@inovem.com
www.inovem.com

INOVEM Ltd is registered in England and Wales No 4228932
Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE



Re: adding a document using curl

2011-03-03 Thread Gary Taylor

As an example, I run this in the same directory as the msword1.doc file:

curl 
http://localhost:8983/solr/core0/update/extract?literal.docid=74literal.type=5; 
-F file=@msword1.doc


The type literal is just part of my schema.

Gary.


On 03/03/2011 11:45, Ken Foskey wrote:

On Thu, 2011-03-03 at 12:36 +0100, Markus Jelsma wrote:

Here's a complete example
http://wiki.apache.org/solr/UpdateXmlMessages#Passing_commit_parameters_as_part_of_the_URL

I should have been clearer.   A rich text document,  XML I can make work
and a script is in the example docs folder

http://wiki.apache.org/solr/ExtractingRequestHandler

I also read the solr 1.4 book and tried samples in there,   could not
make them work.

Ta






Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-31 Thread Gary Taylor
Can anyone shed any light on this, and whether it could be a config 
issue?  I'm now using the latest SVN trunk, which includes the Tika 0.8 
jars.


When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) 
to the ExtractingRequestHandler, I get the following log entry 
(formatted for ease of reading) :


SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, 
application/octet-stream, stream_size, 260, stream_name, solr1.zip, 
Content-Type, application/zip]

},
ignored_=ignored_(1.0)={
[package-entry, package-entry]
},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

ignored_stream_content_type=ignored_stream_content_type(1.0)={application/octet-stream}, 


ignored_stream_size=ignored_stream_size(1.0)={260},
ignored_stream_name=ignored_stream_name(1.0)={solr1.zip},
ignored_content_type=ignored_content_type(1.0)={application/zip},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={  doc2.txtdoc1.txt}
}
]

So, the data coming back from Tika when parsing a ZIP file does not 
include the file contents, only the names of the files contained 
therein.  I've tried forcing stream.type=application/zip in the CURL 
string, but that makes no difference.  If I specify an invalid 
stream.type then I get an exception response, so I know it's being used.


When I send one of those txt files individually to the 
ExtractingRequestHandler, I get:


SolrInputDocument[
{
ignored_meta=ignored_meta(1.0)={
[stream_source_info, file, stream_content_type, text/plain, 
stream_size, 30, Content-Encoding, ISO-8859-1, stream_name, doc1.txt]

},
ignored_stream_source_info=ignored_stream_source_info(1.0)={file},

ignored_stream_content_type=ignored_stream_content_type(1.0)={text/plain},

ignored_stream_size=ignored_stream_size(1.0)={30},
ignored_content_encoding=ignored_content_encoding(1.0)={ISO-8859-1},
ignored_stream_name=ignored_stream_name(1.0)={doc1.txt},
docid=docid(1.0)={74},
type=type(1.0)={5},
text=text(1.0)={The quick brown fox  }
}
]

and we see the file contents in the text field.

I'm using the following requestHandler definition in solrconfig.xml:

!-- Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler --
requestHandler name=/update/extract 
class=org.apache.solr.handler.extraction.ExtractingRequestHandler 
startup=lazy

lst name=defaults
!-- All the main content goes into text... if you need to return
   the extracted text or do highlighting, use a stored field. --
str name=fmap.contenttext/str
str name=lowernamestrue/str
str name=uprefixignored_/str

!-- capture link hrefs but ignore div attributes --
str name=captureAttrtrue/str
str name=fmap.alinks/str
str name=fmap.divignored_/str
/lst
/requestHandler

Is there any further debug or diagnostic I can get out of Tika to help 
me work out why it's only returning the file names and not the file 
contents when parsing a ZIP file?


Thanks and kind regards,
Gary.



On 25/01/2011 16:48, Jayendra Patil wrote:

Hi Gary,

The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.

Tested again with sample url and works fine -
curl 
http://localhost:8080/solr/core0/update/extract?stream.file=C:/temp/extract/777045.zipliteral.id=777045literal.title=Testcommit=true


You would probably need to drill down to the Tika Jars and
the apache-solr-cell-4.0-dev.jar used for Rich documents indexing.

Regards,
Jayendra





Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor

Hi,

I posted a question in November last year about indexing content from 
multiple binary files into a single Solr document and Jayendra responded 
with a simple solution to zip them up and send that single file to Solr.


I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't 
currently allow this to work and only the file names of the zipped files 
are indexed (and not their contents).


I've tried downloading and building the latest Tika (0.8) and replacing 
the tika-parsers and tika-core JARS in 
solr-root\contrib\extraction\lib but this still isn't indexing the 
file contents, and not doesn't even index the file names!


Is there a version of Tika that works with the Solr 1.4.1 released 
distribution which does index the contents of the zipped files?


Thanks and kind regards,
Gary



Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor

Thanks Erlend.

Not used SVN before, but have managed to download and build latest trunk 
code.


Now I'm getting an error when trying to access the admin page (via 
Jetty) because I specify HTMLStripStandardTokenizerFactory in my 
schema.xml, but this appears to be no-longer supplied as part of the 
build so I get an exception cos it can't find that class.  I've checked 
the CHANGES.txt and found the following in the change list to 1.4.0 (!?) :


66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, 
HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, 
HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji)


Unfortunately, I can't seem to get that to work correctly.  Does anyone 
have an example fieldType stanza (for schema.xml) for stripping out HTML ?


Thanks and kind regards,
Gary.



On 25/01/2011 14:17, Erlend Garåsen wrote:

On 25.01.11 11.30, Erlend Garåsen wrote:


Tika version 0.8 is not included in the latest release/trunk from SVN.


Ouch, I wrote not instead of now. Sorry, I replied in a hurry.

And to clarify, by content I mean the main content of a Word file. 
Title and other kinds of metadata are successfully extracted by the 
old 0.4 version of Tika, but you need a newer Tika version (0.8) in 
order to fetch the main content as well. So try the newest Solr 
version from trunk.


Erlend






Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor

OK, got past the schema.xml problem, but now I'm back to square one.

I can index the contents of binary files (Word, PDF etc...), as well as 
text files, but it won't index the content of files inside a zip.


As an example, I have two txt files - doc1.txt and doc2.txt.  If I index 
either of them individually using:


curl 
http://localhost:8983/solr/core0/update/extract?literal.docid=74fmap.content=textliteral.type=5; 
-F file=@doc1.txt


and commit, Solr will index the contents and searches will match.

If I zip those two files up into solr1.zip, and index that using:

curl 
http://localhost:8983/solr/core0/update/extract?literal.docid=74fmap.content=textliteral.type=5; 
-F file=@solr1.zip


and commit, the file names are indexed, but not their contents.

I have checked that Tika can correctly process the zip file when used 
standalone with the tika-app jar - it outputs both the filenames and 
contents.  Should I be able to index the contents of files stored in a 
zip by using extract ?


Thanks and kind regards,
Gary.


On 25/01/2011 15:32, Gary Taylor wrote:

Thanks Erlend.

Not used SVN before, but have managed to download and build latest 
trunk code.


Now I'm getting an error when trying to access the admin page (via 
Jetty) because I specify HTMLStripStandardTokenizerFactory in my 
schema.xml, but this appears to be no-longer supplied as part of the 
build so I get an exception cos it can't find that class.  I've 
checked the CHANGES.txt and found the following in the change list to 
1.4.0 (!?) :


66. SOLR-1343: Added HTMLStripCharFilter and marked HTMLStripReader, 
HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory deprecated. To strip HTML tags, 
HTMLStripCharFilter can be used with an arbitrary Tokenizer. (koji)


Unfortunately, I can't seem to get that to work correctly.  Does 
anyone have an example fieldType stanza (for schema.xml) for stripping 
out HTML ?


Thanks and kind regards,
Gary.



On 25/01/2011 14:17, Erlend Garåsen wrote:

On 25.01.11 11.30, Erlend Garåsen wrote:


Tika version 0.8 is not included in the latest release/trunk from SVN.


Ouch, I wrote not instead of now. Sorry, I replied in a hurry.

And to clarify, by content I mean the main content of a Word file. 
Title and other kinds of metadata are successfully extracted by the 
old 0.4 version of Tika, but you need a newer Tika version (0.8) in 
order to fetch the main content as well. So try the newest Solr 
version from trunk.


Erlend








example schema in branch_3x returns SEVERE errors

2010-11-27 Thread Gary Yngve
logs grep SEVERE solr.err.log
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.KeywordMarkerFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.EnglishMinimalStemFilterFactory'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.PointType'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.LatLonType'
SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.GeoHashField'
SEVERE: java.lang.RuntimeException: schema fieldtype
text(org.apache.solr.schema.TextField) invalid
arguments:{autoGeneratePhraseQueries=true}
SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'location'
specified on field store

It looks like it's loading the correct files...

010-11-27 13:01:28.005:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
2010-11-27 13:01:28.137:INFO::jetty-6.1.22
2010-11-27 13:01:28.204:INFO::Extract
file:/Users/gyngve/git/gems/solr_control/solr_server/webapps/apache-solr-3.1-SNAPSHOT.war
to
/Users/gyngve/git/gems/solr_control/solr_server/work/Jetty_0_0_0_0_8983_apache.solr.3.1.SNAPSHOT.war__apache.solr.3.1.SNAPSHOT__4jaonl/webapp

And on inspection on the war and the solr-core jar inside, I can see the
missing classes, so I am pretty confused.

Has anyone else seen this before or have an idea on how to surmount it?

I'm not quite ready to file a Jira issue on it yet, as I'm hoping it's user
error.

Thanks,
Gary


Re: example schema in branch_3x returns SEVERE errors

2010-11-27 Thread Gary Yngve
Sorry, false alarm.  Had a bad merge and had a stray library linking to an
older version of another library.  Works now.

-Gary


On Sat, Nov 27, 2010 at 4:17 PM, Gary Yngve gary.yn...@gmail.com wrote:

 logs grep SEVERE solr.err.log
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.KeywordMarkerFilterFactory'
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.KeywordMarkerFilterFactory'
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.KeywordMarkerFilterFactory'
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.EnglishMinimalStemFilterFactory'
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.PointType'
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.LatLonType'
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.GeoHashField'
 SEVERE: java.lang.RuntimeException: schema fieldtype
 text(org.apache.solr.schema.TextField) invalid
 arguments:{autoGeneratePhraseQueries=true}
 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'location'
 specified on field store

 It looks like it's loading the correct files...

 010-11-27 13:01:28.005:INFO::Logging to STDERR via
 org.mortbay.log.StdErrLog
 2010-11-27 13:01:28.137:INFO::jetty-6.1.22
 2010-11-27 13:01:28.204:INFO::Extract
 file:/Users/gyngve/git/gems/solr_control/solr_server/webapps/apache-solr-3.1-SNAPSHOT.war
 to
 /Users/gyngve/git/gems/solr_control/solr_server/work/Jetty_0_0_0_0_8983_apache.solr.3.1.SNAPSHOT.war__apache.solr.3.1.SNAPSHOT__4jaonl/webapp

 And on inspection on the war and the solr-core jar inside, I can see the
 missing classes, so I am pretty confused.

 Has anyone else seen this before or have an idea on how to surmount it?

 I'm not quite ready to file a Jira issue on it yet, as I'm hoping it's user
 error.

 Thanks,
 Gary



Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor

Hi,

We're trying to use Solr to replace a custom Lucene server.  One 
requirement we have is to be able to index the content of multiple 
binary files into a single Solr document.  For example, a uniquely named 
object in our app can have multiple attached-files (eg. Word, PDF etc.), 
and we want to index (but not store) the contents of those files in the 
single Solr doc for that named object.


At the moment, we're issuing HTTP requests direct from ColdFusion and 
using the /update/extract servlet, but can only specify a single file on 
each request.


Is the best way to achieve this to extend ExtractingRequestHandler to 
allow multiple binary files and thus specify our own RequestHandler, or 
would using the SolrJ interface directly be a better bet, or am I 
missing something fundamental?


Thanks and regards,
Gary.


Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
Jayendra,

Brilliant! A very simple solution. Thank you for your help.

Kind regards,
Gary


On 17 Nov 2010 22:09, Jayendra Patil lt;jayendra.patil@gmail.comgt; 
wrote: 

The way we implemented the same scenario is zipping all the attachments into

a single zip file which can be passed to the ExtractingRequestHandler for

indexing and included as a part of single Solr document.



Regards,

Jayendra



On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor lt;g...@inovem.comgt; wrote:



gt; Hi,

gt;

gt; We're trying to use Solr to replace a custom Lucene server.  One

gt; requirement we have is to be able to index the content of multiple binary

gt; files into a single Solr document.  For example, a uniquely named object in

gt; our app can have multiple attached-files (eg. Word, PDF etc.), and we want

gt; to index (but not store) the contents of those files in the single Solr doc

gt; for that named object.

gt;

gt; At the moment, we're issuing HTTP requests direct from ColdFusion and using

gt; the /update/extract servlet, but can only specify a single file on each

gt; request.

gt;

gt; Is the best way to achieve this to extend ExtractingRequestHandler to allow

gt; multiple binary files and thus specify our own RequestHandler, or would

gt; using the SolrJ interface directly be a better bet, or am I missing

gt; something fundamental?

gt;

gt; Thanks and regards,

gt; Gary.

gt;



Re: synonyms not working with copyfield

2010-05-13 Thread Gary
Hi Surajit
I aint sure if this is any help, but I had a similar problem but with stop 
words, they were not working with dismax queries. Well to cut a long story it 
seems that all the querying fields need to be configured with stopwords.

Maybe this has the similar affect with Synonyms confguration, thus your 
copyField should be defined as a type that is configured with the 
SynonymFilterFactory, just like 
person_name.

You can find some guidance here:

http://bibwild.wordpress.com/2010/04/14/solr-stop-wordsdismax-gotcha/

Gary





Re: Strange NPE with SOLR-236 (Field collapsing)

2010-05-12 Thread Gary
Hi Eric
 
I catch the NPE in the NonAdjacentDocumentCollapser class and now  it does
return the data field collapsed. 

However I can not promise how accurate or correct this fix is becuase I have not
got allot of time to study all the code.

It would be best if some of the experts could give us a clue 

I made the change in
solr src java org apache solr search fieldcollapse
NonAdjacentDocumentCollapser.java, inner class FloatValueFieldComparator. 




Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread gary


http://www.webtide.com/choose/jetty.jsp

  - Original Message -
  From: Steve Radhouani r.steve@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tuesday, 16 February, 2010 12:38:04 PM
  Subject: Tomcat vs Jetty: A Comparative Analysis?
 
  Hi there,
 
  Is there any analysis out there that may help to choose between Tomcat
 and
  Jetty to deploy Solr? I wonder wether there's a significant difference
  between them in terms of performance.
 
  Any advice would be much appreciated,
  -Steve
 



Tomcat6 env-entry

2007-12-04 Thread Gary Harris
It works excellently in Tomcat 6. The toughest thing I had to deal with is 
discovering that the environment variable in web.xml for solr/home is 
essential. If you skip that step, it won't come up.


   env-entry
   env-entry-namesolr/home/env-entry-name
   env-entry-typejava.lang.String/env-entry-type
   env-entry-valueF:\Tomcat-6.0.14\webapps\solr/env-entry-value
   /env-entry

- Original Message - 
From: Charlie Jackson [EMAIL PROTECTED]

To: solr-user@lucene.apache.org
Sent: Monday, December 03, 2007 11:35 AM
Subject: RE: Tomcat6?


$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can 
create it and it will work exactly the same way it did in Tomcat 5. It's not 
created by default because its not needed by the manager webapp anymore.



-Original Message-
From: Matthew Runo [mailto:[EMAIL PROTECTED]
Sent: Monday, December 03, 2007 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Tomcat6?

In context.xml, I added..

Environment name=/solr/home value=/Users/mruno/solr-src/example/
solr type=java.lang.String /

I think that's all I did to get it working in Tocmat 6.

--Matthew Runo

On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote:


In the Solr wiki, there is not described how to install Solr on
Tomcat 6, and I not managed it myself :(
In the chapter Configuring Solr Home with JNDI there is mentioned
the directory $CATALINA_HOME/conf/Catalina/localhost , which not
exists with TOMCAT 6.

Alternatively I tried the folder $CATALINA_HOME/work/Catalina/
localhost, but with no success.. (I can query the top level page,
but the Solr Admin link then not works).

Can anybody help?

--
Dipl.-Inf. Jörg Kiegeland
ikv++ technologies ag
Bernburger Strasse 24-25, D-10963 Berlin
e-mail: [EMAIL PROTECTED], web: http://www.ikv.de
phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0
=
Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg
board of  directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO)
supervising board: Prof. Dr. Bernd Mahr (chairman)
_




--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.16.12/1162 - Release Date: 11/30/2007 
9:26 PM





RE: Null pointer exception

2007-05-14 Thread Gary Browne
Thanks a lot for your reply Chris

I am running v1.1.0. If I do a search (from the admin page), it throws
the following exception:

java.lang.RuntimeException: java.io.IOException:
/var/www/html/solr/data/index not a directory

There are no exceptions on starting Tomcat, only one warning regarding
JMS client lib not found (related to Cocoon). I have named a file
solr.xml in my $TOMCAT_HOME/conf/Catalina/localhost directory containing
the following:

Context docBase=/usr/local/tomcat/webapps/solr.war debug=0
crossContext=true
Environment name=solr type=java.lang.String
value=/var/www/html/solr override=true /
/Context

I am using the example configs (unmodified).

Thanks again
Gary


Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, 15 May 2007 7:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Null pointer exception

: I have tried indexing from the exampledocs which is just sitting in my
: user home directory but now I get a null pointer exception after
: running:

just to clarify: are you using solr 1.1 or a nightly build? did you
check
the log file to ensure thatthere are no exceptions when you start
tomcat?
are you using the example solrconfig.xml and schema.xml?  have you tried
doing a search first without indexing any docs to see if that executs
and
(correctly) returns 0 docs?

If i had to guess, i'd speculate that you aren't correctly using a
system
prop or JNDI to point Solr at your solr home dir, so it's not finding
the
configs; either that, or you've modified the configs and there is a
syntax error -- either way there should be an exception when the server
starts up, well before you update any docs.


-Hoss



RE: Null pointer exception

2007-05-14 Thread Gary Browne
Hi Chris

The /var/www/html/solr/data/ directory did exist. I tried opening up
permissions completely for testing but no luck (the tomcat user had
write permissions).

I decided to trash the whole installation and start again. I downloaded
last nights build and untarred it. Put the .war into
$TOMCAT_HOME/webapps. Copied the example/solr directory as
/var/www/html/solr. No JNDI file this time, just updated solrconfig to
read /var/www/html/solr as my data.dir.

I can access the admin page but when I try an index action from the
commandline, or a search from the admin page, I get something like:

The requested resource (/solr/select/) is not available

I have other apps running under tomcat okay, seems like it can't find
the lib .jars or can't access the classes within them?

Stuck...

Cheers
Gary



Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, 15 May 2007 9:51 AM
To: solr-user@lucene.apache.org
Subject: RE: Null pointer exception

: I am running v1.1.0. If I do a search (from the admin page), it throws
: the following exception:
:
: java.lang.RuntimeException: java.io.IOException:
: /var/www/html/solr/data/index not a directory

does /var/www/html/solr/data/ exist? ... if so does the effective userID
for tomcat have permission to write to it?  if not does the effective
userID for tomcat have permission to write to /var/www/html/solr/ ?



-Hoss



Null pointer exception

2007-05-13 Thread Gary Browne
Hi All

 

Thanks very much for your help with indexing setup.

 

I should elucidate my directory/file setup just to check that I have
everything in the right place.

 

I have running under $TOMCAT_HOME/webapps the solr directory containing
admin, WEB-INF and META-INF directories.

 

Under my web root I have the solr directory containing the bin, conf and
data directories.

 

I have tried indexing from the exampledocs which is just sitting in my
user home directory but now I get a null pointer exception after
running:

 

./post.sh solr.xml

 

Can anyone offer advice on this please? (I've attached the trace for
reference)

 

Thanks again

Gary

 

 

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

 

May 14, 2007 1:17:34 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.NullPointerException
at org.apache.solr.core.SolrCore.update(SolrCore.java:716)
at 
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)

May 14, 2007 1:17:34 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.lang.NullPointerException
at org.apache.solr.core.SolrCore.update(SolrCore.java:763)
at 
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)


Still having indexing problems

2007-05-11 Thread Gary Browne
Hello

 

 

I have tried indexing the example files using the Jetty method, rather
than Tomcat, which still didn't work. I would prefer to use my Tomcat
URL.

 

After starting jettty, I issued

 

Java -jar post.jar http://localhost:8983/solr/update solr.xml
monitor.xml

 

as in the examples on the tutorial, but post.jar cannot be found...

 

Where is it? Is there a path variable I need to set up somewhere?

 

 

Any help greatly appreciated.

 

 

Regards,

 

Gary

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946