Re: How can i indexing MS-Outlook files?

2008-12-23 Thread Jeryl Cook
http://www.aduna-software.com/technologies/aperture/overview.view

this component Aperture  worked for me..

Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


On Tue, Dec 23, 2008 at 7:42 PM, Norberto Meijome numard...@gmail.com wrote:
 On Sun, 14 Dec 2008 19:22:00 -0800 (PST)
 Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

 Perhaps an easier alternative is to index not the MS-Outlook files
 themselves, but email messages pulled from the IMAP or POP servers, if that's
 where the original emails live.

 PST files ('outlook files') are local to the end user and quite possibly their
 contents aren't available in the server anymore.

 Another alternative could be to access, from Exchange's
 file system itself, the files that represent each object... I don't know
 whether this is still possible in Exchange 2007, or whether it is 'sanctioned'
 by MS... Possibly some kind of object interface with exchange itself would be
 most desirable


 _
 {Beto|Norberto|Numard} Meijome

 FAST, CHEAP, SECURE: Pick Any TWO

 I speak for myself, not my employer. Contents may be hot. Slippery when wet.
 Reading disclaimers makes you go blind. Writing them is worse. You have been
 Warned.




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: [ANNOUNCE] Solr Logo Contest Results

2008-12-18 Thread Jeryl Cook
looks cool :),  how about a talking mascot as

Jeryl Cook
twoenc...@gmail.com

On Thu, Dec 18, 2008 at 1:38 PM, Mathijs Homminga
mathijs.hommi...@knowlogy.nl wrote:
 Good choice!

 Mathijs Homminga

 Chris Hostetter wrote:

 (replies to solr-user please)

 On behalf of the Solr Committers, I'm happy to announce that we the Solr
 Logo Contest is officially concluded. (Woot!)

 And the Winner Is...

 https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
 ...by Michiel

 We ran into a few hiccups during the contest making it take longer then
 intended, but the result was a thorough process in which everyone went above
 and beyond to ensure that the final choice best reflected the wishes of the
 community.

 You can expect to see the new logo appear on the site (and in the Solr
 app) in the next few weeks.

 Congrats Michiel!


 -Hoss


 --
 Knowlogy
 Helperpark 290 C
 9723 ZA Groningen
 +31 (0)50 2103567
 http://www.knowlogy.nl

 mathijs.hommi...@knowlogy.nl
 +31 (0)6 15312977






-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Solr on Solaris

2008-12-05 Thread Jeryl Cook
your out of memory :).

each instance of an application server you can technically only
allocate like 1024mb to the JVM, to take advantage of the memory you
need to run multiple instances of the application server.

are you using RAMDirectory with SOLR?

On Thu, Dec 4, 2008 at 10:40 PM, Kashyap, Raghu
[EMAIL PROTECTED] wrote:
 We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB Ram.
 When we try to index sometimes the HTTP Connection just hangs and the
 client which is posting documents to solr doesn't get any response back.
 We since then have added timeouts to our http requests from the clients.



 I then get this error.



 java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
 of swap space?

 java.lang.OutOfMemoryError: unable to create new native thread

 Exception in thread JmxRmiRegistryConnectionPoller
 java.lang.OutOfMemoryError: unable to create new native thread



 We are running JDK 1.6_10 on the solaris box. . The weird thing is we
 are running the same application on linux box with JDK 1.6 and we
 haven't seen any problem like this.



 Any suggestions?



 -Raghu





-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Mock solr server

2008-11-27 Thread Jeryl Cook
are you trying to unit test something? I would simply make use  of the
Embedded SOLR component in your unit tests..

On 11/27/08, Robert Young [EMAIL PROTECTED] wrote:
 Hi,

 Does anyone know of an easy to use Mock solr server?

 Thanks
 Rob



-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: EmbeddedSolrServer questions

2008-11-18 Thread Jeryl Cook
i am using embeddedSolrServer and simply has a queue that documents
are sent to ..and a listerner on that queue that writes it to the
index..

or just keep it simple, and do a synchronization block around the
method in the writeserver that writes the document to the index.

Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001

On Tue, Nov 18, 2008 at 9:36 AM, Thierry Templier [EMAIL PROTECTED] wrote:
 Hello,

 I have some questions regarding the use of the EmbeddedSolrServer in order to 
 embed a solr instance into a Java application.

 1°) Is an instance of the EmbeddedSolrServer class threadsafe when used by 
 several concurent threads?

 2°) Regarding to transactions, can an instance of the EmbeddedSolrServer 
 class be used in order to make two transactions in the same time by two 
 different threads?

 Thanks for your help,
 Thierry








-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Max Number of Facets

2008-10-30 Thread Jeryl Cook
is there a limit on the number of facets that i can create in
Solr?(dynamically generated facets.)

-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Max Number of Facets

2008-10-30 Thread Jeryl Cook
I understand what you mean..I am building a system that will
dynammically generate facets which could possible be thousands , but
at most about 6 or 7 facets will be returned using a facet ranking
algorithm so I get what you mean if I request in my query that I
want 1000 faets back compared to just 6 or 7 i could take a
performance hit..

On 10/30/08, Ryan McKinley [EMAIL PROTECTED] wrote:
 the only 'limit' is the effect on your query times...  you could have
 1000+ facets if you are ok with the response time.

 Sorry to give the it depends answer, but it totally depends on your
 data and your needs.



 On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote:

 is there a limit on the number of facets that i can create in
 Solr?(dynamically generated facets.)

 --
 Jeryl Cook
 /^\ Pharaoh /^\
 http://pharaohofkush.blogspot.com/
 Whether we bring our enemies to justice, or bring justice to our
 enemies, justice will be done.
 --George W. Bush, Address to a Joint Session of Congress and the
 American People, September 20, 2001




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Max Number of Facets

2008-10-30 Thread Jeryl Cook
wow ,30k in under 3 seconds

On 10/30/08, Stephen Weiss [EMAIL PROTECTED] wrote:
 I've actually seen cases on our site where it's possible to bring up
 over 30,000 facets for one query.  And they actually come up quickly -
 like, 3 seconds.  It takes longer for the browser to render them.

 --
 Steve

 On Oct 30, 2008, at 6:04 PM, Ryan McKinley wrote:

 the only 'limit' is the effect on your query times...  you could
 have 1000+ facets if you are ok with the response time.

 Sorry to give the it depends answer, but it totally depends on
 your data and your needs.



 On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote:

 is there a limit on the number of facets that i can create in
 Solr?(dynamically generated facets.)

 --
 Jeryl Cook
 /^\ Pharaoh /^\
 http://pharaohofkush.blogspot.com/
 Whether we bring our enemies to justice, or bring justice to our
 enemies, justice will be done.
 --George W. Bush, Address to a Joint Session of Congress and the
 American People, September 20, 2001





-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Using filter to search in SOLR 1.3 with solrj

2008-10-02 Thread Jeryl Cook
i can execute what i want simply with using lucene directly

Hits hits = searcher.search(customScoreQuery, myQuery.getFilter());


howerver, i can't find the right Class , or method in the API to do
this for SOLR  the searcher
I am using the SOLRServer(Embeded version) to execute the .query...


QueryResponse queryResponse = SolrServer.query(customScoreQuery);
//will work, BUT I NEED to use the filter as well...


Thanks


-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Using filter to search in SOLR 1.3 with solrj

2008-10-02 Thread Jeryl Cook
I don't have issues adding a filter query to a SolrQuery...

 i guess ill look at the source code, i just need to pass the a custom
Filter object at runtime before i execute a search using the
SolrServer..
currently this is all i can do the below with SOLR...
SolrServer.query(customScoreQuery);

i need a method that would accept this:
searcher.search(customScoreQuery, myfilter ); , like i am able todo
using lucene searcher.



On Thu, Oct 2, 2008 at 1:43 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 what about:

SolrQuery query = ...;
query.addFilterQuery( type:xxx );


 On Oct 2, 2008, at 1:23 PM, Jeryl Cook wrote:

 i can execute what i want simply with using lucene directly

 Hits hits = searcher.search(customScoreQuery, myQuery.getFilter());


 howerver, i can't find the right Class , or method in the API to do
 this for SOLR  the searcher
 I am using the SOLRServer(Embeded version) to execute the .query...


 QueryResponse queryResponse = SolrServer.query(customScoreQuery);
 //will work, BUT I NEED to use the filter as well...


 Thanks


 --
 Jeryl Cook
 /^\ Pharaoh /^\
 http://pharaohofkush.blogspot.com/
 Whether we bring our enemies to justice, or bring justice to our
 enemies, justice will be done.
 --George W. Bush, Address to a Joint Session of Congress and the
 American People, September 20, 2001





-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Using filter to search in SOLR 1.3 with solrj

2008-10-02 Thread Jeryl Cook
i see,
..would be nice to build component within the code..
programmatically...rather than as  a component to add to the
configuration file..but i will read the docs on how to do this.

thanks

On Thu, Oct 2, 2008 at 2:37 PM, Ryan McKinley [EMAIL PROTECTED] wrote:

 On Oct 2, 2008, at 2:24 PM, Jeryl Cook wrote:

 I don't have issues adding a filter query to a SolrQuery...

 i guess ill look at the source code, i just need to pass the a custom
 Filter object at runtime before i execute a search using the
 SolrServer..
 currently this is all i can do the below with SOLR...
 SolrServer.query(customScoreQuery);

 i need a method that would accept this:
 searcher.search(customScoreQuery, myfilter ); , like i am able todo
 using lucene searcher.


 aaah -- that lands you in custom plugin territory...

 perhaps look at building a QueryComponent

 ryan





-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: What's the bottleneck?

2008-09-12 Thread Jeryl Cook
I think you should justs break up your index across boxes and do a
federated search across them...
since you mentioned you have a single machine..

Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


On Thu, Sep 11, 2008 at 3:58 PM, Jason Rennie [EMAIL PROTECTED] wrote:
 On Thu, Sep 11, 2008 at 1:29 PM, [EMAIL PROTECTED] wrote:

 what is your index configuration???


 Not sure what you mean.  We're using 1.2, though we've tested with a recent
 nightly and didn't see a significant change in performance...


 What is your average size form the returned fields ???


 Returned fields are relatively small, ~200 characters total per document.
 We're requesting the top 10 or so docs.

 How much memory have your System ??


 8g.  We give the jvm a 2g (max) heap.  We have another solr running on the
 same box also w/ 2g heap.  The Linux kernel caches ~2.5g of disk.


 Do you have long fieds who is returned in the queries ?


 No.  The searched and returned fields are relatively short.  One
 searched-over (but not returned) field can get up to a few hundred
 characters, but it's safe to assume they're all  1k.


 Do you have actívate the Highlighting in the request ?


 Nope.


 Are you using multi-value filed for filter ...


 No, it does not have the multiValue attribute turned on.  The qf field is
 just an integer.

 Any thoughts/comments are appreciated.

 Thanks,

 Jason




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Update schema.xml without restarting Solr?

2008-03-26 Thread Jeryl Cook
Top often requested feature:
1. Make the option on using the RAMDirectory to hook in Terracotta(
billion(s) of items in an index anyone?..it would be possible using
this.)
2. Make the schema.xml configurable at runtime, not really sure the
best way to address this, because changing the schema would require
re-indexing the documents.


Terracotta:
http://www.terracotta.org/

On Tue, Mar 25, 2008 at 11:27 AM,  [EMAIL PROTECTED] wrote:
 Hi,

  The wiki for Solr talks about the schema.xml, and it seems that
  changes in this file requires a restart of Solr before they have effect.

  In the wiki it says:

  
  How can I rebuild my index from scratch if I change my schema?

  The most efficient/complete way is to...

 1. Stop your application server
 2. Change your schema.xml file
 3. Delete the index directory in your data directory
 4. Start your application server (Solr will detect that there is
  no existing index and make a new one)
 5. Re-Index your data

  If the permission scheme of your server does not allow you to manually
  delete the index directory an alternate technique is...

 1. Stop your application server
 2. Change your schema.xml file
 3. Start your application server
 4. Use the match all docs query in a delete by query command:
  deletequery*:*/query/delete
 5. Send an optimize/ command.
 6. Re-Index your data
  

  Is this really the case? I find that quite strange that you need to
  restart solr for a change in the schema.xml. The way we plan to use
  Solr together with a Content Management System is that the
  authors/editors can create new article/document types when needed,
  without any need to restart anything. The CMS itself has full support
  for this. But we need Solr to also support this. Is that possible?
  Like a simple realoadSchemaXml/ command, maybe, that would trigger
  Solr to re-read it's schema.xml file.

  If this is not possible to do, is it really necessary to restart the
  entire application server for a change in schema.xml to have effect?
  Or only the solr webapp?

  Regards
  /Jimi




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
..Act your age, and not your shoe size.. -Prince(1986)


Re: Update schema.xml without restarting Solr?

2008-03-26 Thread Jeryl Cook
i wouldn't call Terracotta approach magic(smile)..., it's being used
quite a bit in many scalable high performing projects...

i personally used Terracotta and Lucene, and it worked but did not try
to cluster it with multiple terracotta(workers) across nodes , and
the Terracotta(master)..just a single box with two tomcat instances...

However talk is cheap, if I have the time over the next few weeks
ill make a bench mark test based on the Terracotta and Lucene, with
maybe 3 nodes?and a 1 million documents..
maybe some others can do the same :)..

FYI: 
http://www.terracotta.org/confluence/display/tcforge/Proposal+-+Terracotta+for+Lucene

Jeryl Cook

On Wed, Mar 26, 2008 at 5:16 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Wed, Mar 26, 2008 at 4:41 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
   just intuition - haven't tried it, so i'd love to be proved wrong.
Instrumenting Objects and magically passing them around seems like it
would be slower then a tuned approach used in SOLR-303.

  Yep, that's my sense too.  No magic solutions when it comes to scalability.

  -Yonik




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
..Act your age, and not your shoe size.. -Prince(1986)


Re: RAM Based Index for Solr

2008-03-20 Thread Jeryl Cook
there currently is no way to use RAMDirectory instead of FSDirectory
yet in SOLR, however there is a feature request to implement this.
I personally think this will be great because we could use Terracotta
to handle the clustering.

Jeryl Cook


On Thu, Mar 20, 2008 at 1:07 AM, Norberto Meijome [EMAIL PROTECTED] wrote:
 On Wed, 19 Mar 2008 17:04:34 -0700 (PDT)
  swarag [EMAIL PROTECTED] wrote:

   In Lucene there is a Ram Based Index
   org.apache.lucene.store.RAMDirectory.
   Is there a way to setup my index in solr to use a RAMDirectory?

  create a mountpoint on a ramdrive (tmpfs in linux, i think), and put your 
 index in there... ? or does lucene do anything other than that?

  B

  _
  {Beto|Norberto|Numard} Meijome

  Unix is very simple, but it takes a genius to understand the simplicity.
Dennis Ritchie

  I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
 Reading disclaimers makes you go blind. Writing them is worse. You have been 
 Warned.




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
..Act your age, and not your shoe size.. -Prince(1986)


DynamicField and FacetFields..

2007-12-01 Thread Jeryl Cook
Question:
I need to dynamically data to SOLR, so I do not have a predefined list of 
field names...

so i use the dynamicField option in the schma and match approioate datatype..
in my schema.xml
field name=id type=string indexed=true stored=true required=true /

dynamicField name=*_s  type=string  indexed=true  stored=true/



Then programatically my code
...
document.addField( dynamicFieldName + _s,  dynamicFieldValue, 10 ); 
facetFieldNames.put( dynamicFieldName + _s,null);//TODO:use copyField..
server.add( document,true );
server.commit();

when i attempt to graph results, i want to display 
SolrQuery query = new SolrQuery();
query.setQuery( *:* );
query.setFacetLimit(10);//TODO:
Iterator facetsIt = facetFieldNames.entrySet().iterator();
while(facetsIt.hasNext()){
EntryString,Stringentry = (Entry)facetsIt.next();
String facetName = (String)entry.getKey();
query.addFacetField(facetName);
}
 
QueryResponse rsp;
   
rsp = server.query( query );
   ListFacetField facetFieldList = rsp.getFacetFields(); 
   assertNotNull(facetFieldList);

   


my facetFieldList is null, of course if i addFacetField if id it 
works..because i define it in the schema.xml

is this just a something that is not implemented? or am i missing something...

Thanks.



Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)

 Date: Fri, 30 Nov 2007 21:23:59 -0500
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Highlighting, word index
 
 It's good you already have the data because if you somehow got it from
 some sort of calculations I'd have to tell my product manager that
 the feature he wanted that I told him couldn't be done with our data
 was possible after all G...
 
 About page breaks:
 
 Another approach to paging is to index a special page token with an
 increment of 0 from the last word of the page. Say you have the following:
 last ctrl-l first. Then index last, $$$ at an increment of 0 then first.
 
 You can then quite quickly calculate the pages by using
 termdocs/termenum on your special token and count.
 
 Which approach you use depends upon whether you want span and/or
 phrase queries to match across page boundaries. If you use an increment as
 Mike suggests, matching last first~3 won't work. It just depends upon
 whether how you want to match across the page break.
 
 Best
 Erick
 
 On Nov 30, 2007 4:37 PM, Mike Klaas [EMAIL PROTECTED] wrote:
 
  On 30-Nov-07, at 1:02 PM, Owens, Martin wrote:
 
  
   Hello everyone,
  
   We're working to replace the old Linux version of dtSearch with
   Lucene/Solr, using the http requests for our perl side and java for
   the indexing.
  
   The functionality that is causing the most problems is the
   highlighting since we're not storing the text in solr (only
   indexing) and we need to highlight an image file (ocr) so what we
   really need is to request from solr the word indexes of the
   matches, we then tie this up to the ocr image and create html boxes
   to do the highlighting.
 
  This isn't possible with Solr out-of-the-box.  Also, the usual
  methods for highlighting won't work because Solr typically re-
  analyzes the raw text to find the appropriate highlighting points.
  However, it shouldn't be too hard to come up with a custom solution.
  You can tell lucene to store token offsets using TermVectors
  (configurable via schema.xml).  Then you can customize the request
  handler to return the token offsets (and/or positions) by retrieving
  the TVs.
 
   The text is also multi page, each page is seperated by Ctrl-L page
   breaks, should we handle the paging out selves or can Solr tell use
   which page the match happened on too?
 
  Again, not automatically.  However, if you wrote an analyzer that
  bumped up the position increment of tokens every time a new page was
  found (to, say the next multiple of 1000), then you infer the
  matching page by the token position.
 
  cheers,
  -Mike
 


RE: DynamicField and FacetFields..

2007-12-01 Thread Jeryl Cook
fixed, i had a typo...may want to delete my post( i want to :P .)

Jeryl Cook  
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: DynamicField and  FacetFields..
 Date: Sat, 1 Dec 2007 14:21:12 -0500
 
 Question:
 I need to dynamically data to SOLR, so I do not have a predefined list of 
 field names...
 
 so i use the dynamicField option in the schma and match approioate datatype..
 in my schema.xml
 field name=id type=string indexed=true stored=true required=true /
 
 dynamicField name=*_s  type=string  indexed=true  stored=true/
 
 
 
 Then programatically my code
 ...
 document.addField( dynamicFieldName + _s,  dynamicFieldValue, 10 ); 
 facetFieldNames.put( dynamicFieldName + _s,null);//TODO:use copyField..
 server.add( document,true );
 server.commit();
 
 when i attempt to graph results, i want to display 
 SolrQuery query = new SolrQuery();
 query.setQuery( *:* );
 query.setFacetLimit(10);//TODO:
 Iterator facetsIt = facetFieldNames.entrySet().iterator();
 while(facetsIt.hasNext()){
 EntryString,Stringentry = (Entry)facetsIt.next();
 String facetName = (String)entry.getKey();
 query.addFacetField(facetName);
 }
  
 QueryResponse rsp;

 rsp = server.query( query );
ListFacetField facetFieldList = rsp.getFacetFields(); 
assertNotNull(facetFieldList);
 

 
 
 my facetFieldList is null, of course if i addFacetField if id it 
 works..because i define it in the schema.xml
 
 is this just a something that is not implemented? or am i missing something...
 
 Thanks.
 
 
 
 Jeryl Cook 
 
 
 
 /^\ Pharaoh /^\ 
 
 http://pharaohofkush.blogspot.com/ 
 
 
 
 ..Act your age, and not your shoe size..
 
 -Prince(1986)
 
  Date: Fri, 30 Nov 2007 21:23:59 -0500
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Re: Solr Highlighting, word index
  
  It's good you already have the data because if you somehow got it from
  some sort of calculations I'd have to tell my product manager that
  the feature he wanted that I told him couldn't be done with our data
  was possible after all G...
  
  About page breaks:
  
  Another approach to paging is to index a special page token with an
  increment of 0 from the last word of the page. Say you have the following:
  last ctrl-l first. Then index last, $$$ at an increment of 0 then first.
  
  You can then quite quickly calculate the pages by using
  termdocs/termenum on your special token and count.
  
  Which approach you use depends upon whether you want span and/or
  phrase queries to match across page boundaries. If you use an increment as
  Mike suggests, matching last first~3 won't work. It just depends upon
  whether how you want to match across the page break.
  
  Best
  Erick
  
  On Nov 30, 2007 4:37 PM, Mike Klaas [EMAIL PROTECTED] wrote:
  
   On 30-Nov-07, at 1:02 PM, Owens, Martin wrote:
  
   
Hello everyone,
   
We're working to replace the old Linux version of dtSearch with
Lucene/Solr, using the http requests for our perl side and java for
the indexing.
   
The functionality that is causing the most problems is the
highlighting since we're not storing the text in solr (only
indexing) and we need to highlight an image file (ocr) so what we
really need is to request from solr the word indexes of the
matches, we then tie this up to the ocr image and create html boxes
to do the highlighting.
  
   This isn't possible with Solr out-of-the-box.  Also, the usual
   methods for highlighting won't work because Solr typically re-
   analyzes the raw text to find the appropriate highlighting points.
   However, it shouldn't be too hard to come up with a custom solution.
   You can tell lucene to store token offsets using TermVectors
   (configurable via schema.xml).  Then you can customize the request
   handler to return the token offsets (and/or positions) by retrieving
   the TVs.
  
The text is also multi page, each page is seperated by Ctrl-L page
breaks, should we handle the paging out selves or can Solr tell use
which page the match happened on too?
  
   Again, not automatically.  However, if you wrote an analyzer that
   bumped up the position increment of tokens every time a new page was
   found (to, say the next multiple of 1000), then you infer the
   matching page by the token position.
  
   cheers,
   -Mike
  


unsubscribe

2007-11-07 Thread Jeryl Cook
Jeryl Cook  /^\ Pharaoh /^\  http://pharaohofkush.blogspot.com/  ..Act your 
age, and not your shoe size.. -Prince(1986)

 From: [EMAIL PROTECTED] Subject: Re: start.jar -Djetty.port= not working 
 Date: Wed, 7 Nov 2007 10:13:22 -0500 To: solr-user@lucene.apache.org
 On Nov 7, 2007, at 10:07 AM, Mike Davies wrote:  I'm using 1.2, downloaded 
 from   http://apache.rediris.es/lucene/solr/   Where can i get 
 the trunk version?  svn, or 
 http://people.apache.org/builds/lucene/solr/nightly/  

RE: Any tips for indexing large amounts of data?

2007-10-31 Thread Jeryl Cook
Usability consideration,
Not really answering your question, but i must comment using searching on items 
up to 100k makes faceted navigation very effective..but becomes least effective 
past 100k..u may want to consider breaking up the 500k documents in 
categories(typical breadcrumb) to 100k to faceted browse.
 
 Jeryl Cook 



 To: solr-user@lucene.apache.org From: [EMAIL PROTECTED] Subject: Any tips 
 for indexing large amounts of data? Date: Wed, 31 Oct 2007 10:30:50 -0400  
 Hi,  I am creating an index of approx 500K documents. I wrote an indexing  
 program using embeded solr: http://wiki.apache.org/solr/EmbeddedSolr  and am 
 seeing probably a 10 fold increase in indexing speeds. My  problem is 
 though, that if I try to reindex say 20K docs at a time it  slows down 
 considerably. I currently batch my updates in lots of 100  and between 
 batches I close and reopen the connection to solr like so:  private void 
 openConnection(String environment) throws  ParserConfigurationException, 
 IOException, SAXException { System.setProperty(solr.solr.home, 
 SOLR_HOME); solrConfig = new SolrConfig(solrconfig.xml); solrCore = new 
 SolrCore(SOLR_HOME + data/ + environment,  solrConfig, new 
 IndexSchema(solrConfig, schema.xml)); logger.debug(Opened solr 
 connection); }  private void closeConnection() { solrCore.close(); 
 solrCore = null; logger.debug(Closed solr connection); }  Does anyone 
 have any pointers or see anything obvious I'm doing wrong?  Thanks   PS 
 Sorry if this is posted twice.

RE: RAMDirectory

2007-09-22 Thread Jeryl Cook
not yet implemented ,hopefully soon :

http://jira.terracotta.org/jira/browse/CDV-399



Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)

 Date: Sat, 22 Sep 2007 15:33:58 -0400
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: RAMDirectory
 
 HI,
 
 Does any know how to use RAM disk for index?
 
 Thanks,
 
 Jae Jo, 


RE: Solr and terracotta

2007-08-27 Thread Jeryl Cook
had no problems with Terracotta, I got a good handle on the product..

Maybe you all at Terracotta could lead the implementation to  patch SOLR to 
allow it to use the RAMDirectory ( a setter)  so terracotta can hook into the 
RAMDirectory...
the way Terracotta handles clustering , 

Those of you who are not familiar with Terracotta, it clusters the JVM, and 
uses a master server to help all the child servers to stay synced.,  this 
approach will allow SOLR to be clustered very easily(indexing 1 node, will 
index all nodes), not to mentioned the performance boost indexing,and perhaps 
searching.Also it uses virtual memory , so the amount of documents 
stored in the RAMDirectory is only limited to space.

Jeryl Cook 
/^\ Pharaoh /^\ 
http://pharaohofkush.blogspot.com/ 
 Date: Wed, 22 Aug 2007 14:46:19 -0700
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: RE: Solr and terracotta
 
 
 Jeryl,
 
 I remember you asking about how to hook in the RAMDirectory a while back. 
 It seemed like there was maybe some support within Solr that you needed.  I
 assume you're suggesting adding an issue in the Solr  JIRA, right?
 
 Is there something that the Terracotta team can do to help?
 
 Cheers,
 Orion
 
 
 Jeryl Cook wrote:
  
  tried it, didn't work that well...so I ended up making my own little
  faceted Search engine directly using RAMDirectory and clustering it via
  Terracotta...not as good as SOLR(smile), but it worked.
  i actually posted some questions awhile back in trying to get it to work.
  so terracotta can hook the RAMDirectory, maybe be good to submit this in
  JIRA for terrocotta support!
  
  Jeryl Cook 
   /^\ Pharaoh /^\ 
  
  
  http://pharaohofkush.blogspot.com/ 
  
  
  
  ..Act your age, and not your shoe size..
  
  -Prince(1986)
  
  Date: Wed, 22 Aug 2007 16:18:24 -0300
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Solr and terracotta
  
  Recently I ran into this topic. I googled it a little and didn't find
  much
  information.
  It would be great to have solr working with RAMDirectory and Terracotta.
  We
  could stop using crons for rsync, right?
  Has anyone tried that out?
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537
 Sent from the Solr - User mailing list archive at Nabble.com.
 


RE: Solr and terracotta

2007-08-22 Thread Jeryl Cook
tried it, didn't work that well...so I ended up making my own little faceted 
Search engine directly using RAMDirectory and clustering it via 
Terracotta...not as good as SOLR(smile), but it worked.
i actually posted some questions awhile back in trying to get it to work. so 
terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA 
for terrocotta support!

Jeryl Cook 
 /^\ Pharaoh /^\ 


http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)

 Date: Wed, 22 Aug 2007 16:18:24 -0300
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Solr and terracotta
 
 Recently I ran into this topic. I googled it a little and didn't find much
 information.
 It would be great to have solr working with RAMDirectory and Terracotta. We
 could stop using crons for rsync, right?
 Has anyone tried that out?


RE: RAMDirecotory instead of FSDirectory for SOLR

2007-05-31 Thread Jeryl Cook
Thats the thing,Terracotta persists everything it has in memory to the disk 
when it overflows(u can set how much u want to use in memory), or when the 
server goes offline.  When the server comes back the master terracotta simply 
loads it back into the memory of the once offline worker..identical to the 
approach SOLR already does to handle scalability, this allows unlimited 
storage of the items in memory, ... you just need to cluster the RAMDirectory 
according to the sample giving by TerracottaHowever i read some of the post 
here...I read some say:  i wonder how performance will be.,etci was 
trying to get it working..andload test the hell out it, and see how it acts 
with large amounts of data, and how it ompares with SOLR using typical 
FSDirectory approach.i plan to post findings..Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986) Date: Thu, 31 May 2007 13:51:53 -0700 From: [EMAIL PROTECTED] 
To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of 
FSDirectory for SOLR   : board, looks like i can achieve this with the 
embedded version of SOLR : uses the lucene RAMDirectory to store the 
index..Jeryl Cook  yeah ... adding asolrconfig.xml option for using a 
RAMDirectory would be possible ... but almost meaningless for most people (the 
directory would go away when the server shuts down) ... even for use cases 
like what you describe (hooking in terrecota) it wouldn't be enough in itself, 
because there would be no hook to give terracota access to it.   -Hoss 

RE: RAMDirecotory instead of FSDirectory for SOLR

2007-05-31 Thread Jeryl Cook
i have Terracotta to work with Lucene , and it works find with the 
RAMDirectory...i am trying to get it to work with SOLR(Hook the 
RAMDirectory..)..., when i do, ill post the findings,problems,etc..Thanks for 
feedback from everyone.Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986) Date: Thu, 31 May 2007 18:24:26 -0700 From: [EMAIL PROTECTED] 
To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of 
FSDirectory for SOLR   Jeryl,  If you need any help getting Terracotta to 
work under Lucene or if you have any questions about performance tuning and/or 
load testing, you can also use the Terracotta community resources (mailing 
lists, forums, IRC, whatnot): 
http://www.terracotta.org/confluence/display/orgsite/Community.  We'd be 
more than happy to help you get this stuff working.  Cheers, Orion   
Jeryl Cook wrote:Thats the thing,Terracotta persists everything it has 
in memory to the  disk when it overflows(u can set how much u want to use in 
memory), or  when the server goes offline.  When the server comes back the 
master  terracotta simply loads it back into the memory of the once offline 
 worker..identical to the approach SOLR already does to handle  
scalability, this allows unlimited storage of the items in memory, ...  
you just need to cluster the RAMDirectory according to the sample giving  
by TerracottaHowever i read some of the post here...I read some say:   i 
wonder how performance will be.,etci was trying to get it  
working..andload test the hell out it, and see how it acts with large  
amounts of data, and how it ompares with SOLR using typical FSDirectory  
approach.i plan to post findings..Jeryl Cook /^\ Pharaoh /^\  
   http://pharaohofkush.blogspot.com/ ..Act your age, and 
not your shoe size..-Prince(1986) Date: Thu, 31 May 2007 13:51:53 
-0700 From:  [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: 
RE:  RAMDirecotory instead of FSDirectory for SOLR   : board, looks like 
i  can achieve this with the embedded version of SOLR : uses the lucene  
RAMDirectory to store the index..Jeryl Cook  yeah ... adding  
asolrconfig.xml option for using a RAMDirectory would be possible ... but  
almost meaningless for most people (the directory would go away when the  
server shuts down) ... even for use cases like what you describe (hooking  
in terrecota) it wouldn't be enough in itself, because there would be no  
hook to give terracota access to it.   -Hoss --  View this message 
in context: 
http://www.nabble.com/RAMDirecotory-instead-of-FSDirectory-for-SOLR-tf3843377.html#a10905062
 Sent from the Solr - User mailing list archive at Nabble.com. 

RAMDirecotory instead of FSDirectory for SOLR

2007-05-30 Thread Jeryl Cook
Is it possible to simply change configuration to use RAMDirectory , instead of 
the FSDirectory..if not it would be great to have this as possible option int 
he configuration fileThe Master/Worker pattern used for handling 
scalability works(outlined in SOLR manual/wiki).its a proven 
pattern..however Terracotta , http://terracottatech.com/ is able to cluster the 
RAMDirectory(items that cannot fit in memory are managed written to disk.)   
..i would love to take advantageof this  approach...can you tell me if it is 
possible to switch out??Thanks.Jeryl Cook  /^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)