from:"Eric Pugh"

Re: No Live server exception: Solr Cloud 6.6.6

2021-01-06 Thread Eric Pugh

I think you are going in the wrong direction in your upgrade path…. While it 
may *seem* simpler to go from master/slave 6.6.6 to SolrCloud 6.6.6, you are 
much better off just going from master/slave 6.6.6 to SolrCloud on 8.7 (or 
whatever is the latest).

SolrCloud has evolved since Solr 6 by two MAJOR versions, and is much more 
robust with so many fixes.  Today, I suspect very few folks who know the 
innards of Solr are actually still familiar with the 6.x line!

This is also a really good opportunity to relook at your schema as well, and 
make sure you are using all the features in the best way possible.




> On Jan 6, 2021, at 1:40 AM, Ritvik Sharma  wrote:
> 
> Hi Guys,
> 
> Any update.
> 
> On Tue, 5 Jan 2021 at 18:06, Ritvik Sharma  wrote:
> 
>> Hi Guys
>> 
>> Happy New Year.
>> 
>> We are trying to move to solr cloud 6.6.6 as we are using same version
>> master-slave arch.
>> 
>> solr cloud: 6.6.6
>> zk: 3.4.10
>> 
>> We are facing few errors
>> 1. Every time we upload a model-store using curl XPUT command , it is
>> showing at that time but after reloading collection , it is removed
>> automatically.
>> 
>> 2.While querying the data, we are getting below exception,
>> 
>> "msg": "org.apache.solr.client.solrj.SolrServerException: No live
>> SolrServers available to handle this request:[
>> http://x.x.x.x:8983/solr/solrcollection_shard1_replica2,
>> http://x.x.x.y:8983/solr/solrcollection_shard1_replica1]","trace": 
>> "org.apache.solr.common.SolrException:
>> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
>> available to handle this request:[
>> http://x.x.x.x:8983/solr/solrcollection_shard1_replica2,
>> http://x.x.x.y:8983/solr/solrcollection_shard1_replica1]\n\tat
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:416)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)\n\tat
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:724)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> 
>> 
>> 
>> 
>> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: data import handler deprecated?

2020-11-30 Thread Eric Pugh

You don’t need to abandon DIH right now….   You can just use the Github hosted 
version….   The more people who use it, the better a community it will form 
around it!It’s a bit chicken and egg, since no one is actively discussing 
it, submitting PR’s etc, it may languish.   If you use it, and test it, and 
support other community folks using it, then it will continue on!

> On Nov 29, 2020, at 12:12 PM, Dmitri Maziuk  wrote:
> 
> On 11/29/2020 10:32 AM, Erick Erickson wrote:
> 
>> And I absolutely agree with Walter that the DB is often where
>> the bottleneck lies. You might be able to
>> use multiple threads and/or processes to query the
>> DB if that’s the case and you can find some kind of partition
>> key.
> 
> IME the difficult part has always been dealing with incremental updates, if 
> we were to roll our own, my vote would be for a database trigger that does a 
> POST in whichever language the DBMS likes.
> 
> But this has not been a part of our "solr 6.5 update" project until now.
> 
> Thanks everyone,
> Dima

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: Solr 8.6.2 - Admin UI Issue

2020-10-08 Thread Eric Pugh

I’ve seen this behavior as well jumping between versions of Solr.Typically 
in the browser console I see some sort of very opaque Javascript error.   

> On Oct 8, 2020, at 5:54 AM, Colvin Cowie  wrote:
> 
> Images won't be included on the mailing list. You need to put them
> somewhere else and link to them.
> 
> With that said, if you're switching between versions, maybe your browser
> has the old UI cached? Try clearing the cache / viewing it in a private
> window and see if it's any different.
> 
> On Wed, 7 Oct 2020 at 11:22, Vinay Rajput  <mailto:vinayrajput4...@gmail.com>> wrote:
> 
>> Hi All,
>> 
>> We are currently using Solr 7.3.1 in cloud mode and planning to upgrade.
>> When I bootstrapped Solr 8.6.2 in my local machine and uploaded all
>> necessary configs, I noticed one issue in admin UI.
>> 
>> If I select a collection and go to files, it shows the content tree having
>> all files and folders present in that collection. In Solr 8.6.2, it is
>> somehow not showing the folders correctly. In my screenshot, you can see
>> that velocity and xslt are the folders and we have some config files inside
>> these two folders. Because of this issue, I can't click on folder nodes and
>> see children nodes. I checked the network calls and it looks like we are
>> getting the correct data from Solr. So, it looks like an Admin UI issue to
>> me.
>> 
>> Does anyone know if this is a* known issue* or I am missing something
>> here? Has anyone noticed the similar issue?  I can confirm that It works
>> fine with Solr 7.3.1.
>> 
>> [image: image.png][image: image.png]
>> 
>> Left image is for 8.6.2 and right image is for 7.3.1
>> 
>> Thanks,
>> Vinay

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: Master/Slave

2020-10-08 Thread Eric Pugh

nd the deprecation of CDCR.
>>>> 
>>>> So we are left with the question whether we should expect Master/Slave
>>>> replication also to be deprecated; and if so, with what is it expected to
>>>> be replaced (since not with CDCR)? Or is it now sufficiently safe to assume
>>>> that Master/Slave replication will continue to be supported after all
>>>> (since the assertion that it would be replaced by CDCR has been
>>>> discredited)? In either case, are there other suggested implementations of
>>>> having a read-only SolrCloud receive data from a read/write SolrCloud?
>>>> 
>>>> 
>>>> Thanks
>>>> 
>>>> -Original Message-
>>>> From: Shawn Heisey 
>>>> Sent: Tuesday, May 21, 2019 11:15 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: SolrCloud (7.3) and Legacy replication slaves
>>>> 
>>>> On 5/21/2019 8:48 AM, Michael Tracey wrote:
>>>>> Is it possible set up an existing SolrCloud cluster as the master for
>>>>> legacy replication to a slave server or two?   It looks like another
>>>> option
>>>>> is to use Uni-direction CDCR, but not sure what is the best option in
>>>> this
>>>>> case.
>>>> 
>>>> You're asking for problems if you try to combine legacy replication with
>>>> SolrCloud.  The two features are not guaranteed to work together.
>>>> 
>>>> CDCR is your best bet.  This replicates from one SolrCloud cluster to
>>>> another.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>> 
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: Solr client

2020-09-04 Thread Eric Pugh

Konstantinos, have you seen https://solr.cool/?  It’s an aggregation site for 
all the extensions to Solr.   You can add your project there, and that should 
get some more awareness!


> On Sep 2, 2020, at 2:21 AM, Konstantinos Koukouvis 
>  wrote:
> 
> Hi everybody, sorry in advance if I’m using the mailing list wrong, this is 
> the first time I’m attempting such a thing.
> 
> To all you gophers out there we at Mecenat, have been working at a new solr 
> client wrapper with focus on single solr instance usage, that supports the 
> search API, schema API and core admin API. With this email I’m trying to 
> raise awareness to the community, get some feedback by having more people to 
> test every nook and cranny of it, so that we can improve our solution and 
> hopefully help you find that client that makes using solr in go more 
> intuitive and simple.
> 
> Here’s the link, and thank you all for your time:
> https://github.com/mecenat/solr
> 
> With regards,
> Konstantinos
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Loading JSON docs into Solr with Streaming Expressions?

2020-07-24 Thread Eric Pugh

Hey all,   I wanted to load some JSON docs into Solr and as I load them, do 
some manipulations to the documents as they go in.   I looked at 
https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
 
<https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html>,
 however I also wanted to see if Streaming would help.

I’ve used the combination of cat and parseCSV streaming functions successfully 
to load data into Solr, so I looked a bit at what we could do with JSON source 
format.

I didn’t see an obvious path for taking a .json file and loading it, so I 
played around and made this JSON w/ Lines formatted file streaming expression: 
https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3 
<https://github.com/epugh/playing-with-solr-streaming-expressions/pull/3>

The expression looks like
commit(icecat,
  update(icecat,
parseJSONL(
  cat('two_docs.jsonl')
)
  )
)
I was curious what other folks have done?  I saw that there is a 
JSONTupleStream, but it didn’t quite seem to fit the need.

Eric

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Checking my understanding of SOLR_HOME

2020-03-27 Thread Eric Pugh

I am struggling with using the zkHost and the JDBC end point 
(https://lucene.apache.org/solr/guide/6_6/parallel-sql-interface.html#jdbc-driver)
 and I believe it’s because when I deploy, it gets a IP address that is 
internal to the network accessible, but accessible externally via DNS name:

http://quepid-solr.dev.o19s.com:8985/solr/#/~cloud?view=tree

I’m also using Docker, so the internal :8983 gets mapped to the external :8985 
port.   

I *think* what I need to do is:

1) Use the SOLR_HOST parameter to make sure the hostname is 
“quepid-solr.dev.o19s.com” in my startup Script.
2) Set the environment variable SOLR_PORT to be 8985 instead of using the 
Docker mapping of ports.

If this is correct understanding, then I think adding a bit more documentation 
to 
https://lucene.apache.org/solr/guide/8_4/taking-solr-to-production.html#solr-hostname
 would be useful, and happy to add a documentation PR as it’s not super clear 
to me.

Eric
___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: How do I add my own Streaming Expressions?

2019-11-19 Thread Eric Pugh

The documentation in the StreamHandler suggests adding into Solrconfig some 
streamFunctions:

 *  
 *  org.apache.solr.client.solrj.io.stream.ReducerStream 
 *  org.apache.solr.client.solrj.io.stream.RecordCountStream 
 * 


https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/StreamHandler.java#L114

What is happening in StreamHandler doesn’t seem to be working, however in the 
similar GraphHandler, there is a call to “streamFunctions”:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/GraphHandler.java#L90

I’m still debugging this…

Eric



> On Nov 15, 2019, at 9:43 PM, Eric Pugh  
> wrote:
> 
> What is the process for adding new Streaming Expressions?   
> 
> It appears that the org.apache.solr.client.solrj.io.Lang method statically 
> loads all the streaming expressions?
> 
> Eric
> 
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com <http://www.opensourceconnections.com/> 
> | My Free/Busy <http://tinyurl.com/eric-cal>  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>   
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

How do I add my own Streaming Expressions?

2019-11-15 Thread Eric Pugh

What is the process for adding new Streaming Expressions?   

It appears that the org.apache.solr.client.solrj.io.Lang method statically 
loads all the streaming expressions?

Eric

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: regarding Extracting text from Images

2019-10-25 Thread Eric Pugh

Just to stir the pot on this topic, here is an article about why and how to use 
Tika inside of Solr:

https://opensourceconnections.com/blog/2019/10/24/it-s-okay-to-run-tika-inside-of-solr-if-and-only-if/

> On Oct 23, 2019, at 7:21 PM, Erick Erickson  wrote:
> 
> Here’s a blog about why and how to use Tika outside Solr (and an RDBMS too, 
> but you can pull that part out pretty easily):
> https://lucidworks.com/post/indexing-with-solrj/
> 
> 
> 
>> On Oct 23, 2019, at 7:16 PM, Alexandre Rafalovitch  
>> wrote:
>> 
>> Again, I think you are best to do it out of Solr.
>> 
>> But even of you want to get it to work in Solr, I think you start by
>> getting it to work directly in Tika. Then, get the missing libraries and
>> configuration into Solr.
>> 
>> Regards,
>>   Alex
>> 
>> On Wed, Oct 23, 2019, 7:08 PM suresh pendap,  wrote:
>> 
>>> Hi Alex,
>>> Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
>>> to implement Custom update processor or extend the
>>> ExtractingRequestProcessor?
>>> 
>>> Regards
>>> Suresh
>>> 
>>> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch >>> 
>>> wrote:
>>> 
>>>> I believe Tika that powers this can do so with extra libraries
>>> (tesseract?)
>>>> But Solr does not bundle those extras.
>>>> 
>>>> In any case, you may want to run Tika externally to avoid the
>>>> conversion/extraction process be a burden to Solr itself.
>>>> 
>>>> Regards,
>>>>Alex
>>>> 
>>>> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, 
>>>> wrote:
>>>> 
>>>>> Hello,
>>>>> I am reading the Solr documentation about integration with Tika and
>>> Solr
>>>>> Cell framework over here
>>>>> 
>>>>> 
>>>> 
>>> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
>>>>> 
>>>>> I would like to know if the can Solr Cell framework also be used to
>>>> extract
>>>>> text from the image files?
>>>>> 
>>>>> Regards
>>>>> Suresh
>>>>> 
>>>> 
>>> 
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: Solr Paryload example

2019-10-21 Thread Eric Pugh

Have you checked out
https://github.com/o19s/payload-component

On Mon, Oct 21, 2019 at 2:47 PM Erik Hatcher  wrote:

> How about a single field, with terms like:
>
> store1_USD|125.0 store2_EUR|220.0 store3_GBP|225.0
>
> Would that do the trick?
>
> And yeah, payload decoding is currently limited to float and int with the
> built-in payload() function.   We'd need a new way to pull out
> textual/bytes payloads - like maybe a DocTransformer?
>
> Erik
>
>
> > On Oct 21, 2019, at 9:59 AM, Vincenzo D'Amore 
> wrote:
> >
> > Hi Erick,
> >
> > thanks for getting back to me. We started to use payloads because we have
> > the classical per-store pricing problem.
> > Thousands of stores across and different prices.
> > Then we found the payloads very useful started to use it for many
> reasons,
> > like enabling/disabling the product for such store, save the stock
> > availability, or save the other info like buy/sell price, discount rates,
> > and so on.
> > All those information are numbers, but stores can also be in different
> > countries, I mean would be useful also have the currency and other
> > attributes related to the store.
> >
> > Thinking about an alternative for payloads maybe I could use the dynamic
> > fields, well, I know it is ugly.
> >
> > Consider this hypothetical case where I have two field payload :
> >
> > payloadPrice: [
> > "store1|125.0",
> > "store2|220.0",
> > "store3|225.0"
> > ]
> >
> > payloadCurrency: [
> > "store1|USD",
> > "store2|EUR",
> > "store3|GBP"
> > ]
> >
> > with dynamic fields I could have different fields for each document.
> >
> > currency_store1_s: "USD"
> > currency_store2_s: "EUR"
> > currency_store3_s: "GBP"
> >
> > But how many dynamic fields like this can I have? more than thousands?
> >
> > Again, I've just started to look at solr-ocrhighlighting github project
> you
> > suggested.
> > Those seems have written their own payload object type where store ocr
> > highlighting information.
> > It seems interesting, I'll take a look immediately.
> >
> > Thanks again for your time.
> >
> > Best regards,
> > Vincenzo
> >
> >
> > On Mon, Oct 21, 2019 at 2:55 PM Erick Erickson 
> > wrote:
> >
> >> This is one of those situations where I know a client did it, but didn’t
> >> see the code myself.
> >>
> >> So I can’t help much.
> >>
> >> Perhaps a good question at this point, though, is “why do you want to
> add
> >> string payloads anyway”?
> >>
> >> This isn’t the client, but it might give you some pointers:
> >>
> >>
> >>
> https://github.com/dbmdz/solr-ocrpayload-plugin/blob/master/src/main/java/de/digitalcollections/solr/plugin/components/ocrhighlighting/OcrHighlighting.java
> >>
> >> Best,
> >> Erick
> >>
> >>> On Oct 21, 2019, at 6:37 AM, Vincenzo D'Amore 
> >> wrote:
> >>>
> >>> Hi Erick,
> >>>
> >>> It seems I've reached a dead-point, or at least it seems looking at the
> >>> code, it seems I can't  easily add a custom decoder:
> >>>
> >>> Looking at PayloadUtils class there is getPayloadDecoder method invoked
> >> to
> >>> return the PayloadDecoder :
> >>>
> >>> public static PayloadDecoder getPayloadDecoder(FieldType fieldType) {
> >>>   PayloadDecoder decoder = null;
> >>>
> >>>   String encoder = getPayloadEncoder(fieldType);
> >>>
> >>>   if ("integer".equals(encoder)) {
> >>> decoder = (BytesRef payload) -> payload == null ? 1 :
> >>> PayloadHelper.decodeInt(payload.bytes, payload.offset);
> >>>   }
> >>>   if ("float".equals(encoder)) {
> >>> decoder = (BytesRef payload) -> payload == null ? 1 :
> >>> PayloadHelper.decodeFloat(payload.bytes, payload.offset);
> >>>   }
> >>>   // encoder could be "identity" at this point, in the case of
> >>> DelimitedTokenFilterFactory encoder="identity"
> >>>
> >>>   // TODO: support pluggable payload decoders?
> >>>
> >>>   return decoder;
> >>> }
> >>>
> >>> Any advice to work around this situation?
> >>>
> >>>
> >>> On Mon, Oct 21, 2019 at 1:51 AM Erick Erickson <
> erickerick...@gmail.com>
> >>> wrote:
> >>>
>  You’d need to write one. Payloads are generally intended to hold
> >> numerics
>  you can then use in a function query to factor into the score…
> 
>  Best,
>  Erick
> 
> > On Oct 20, 2019, at 4:57 PM, Vincenzo D'Amore 
>  wrote:
> >
> > Sorry, I just realized that I was wrong in how I'm using the payload
> > function.
> > Give that the payload function only handles a numeric (integer or
> >> float)
> > payload, could you suggest me an alternative function that handles
>  strings?
> > If not, should I write one?
> >
> > On Sun, Oct 20, 2019 at 10:43 PM Vincenzo D'Amore <
> v.dam...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> I'm trying to understand what I did wrong with a payload query that
> >> returns
> >>
> >> error: {
> >> metadata: [ "error-class", "org.apache.solr.common.SolrException",
> >> "root-error-class", "org.apache.solr.common.SolrException" ],
> >> msg: "No

Custom Jars for a config in the Solr Cloud world..

2012-08-14 Thread Eric Pugh

I've got a Solr instance with a number of cores that are each configured by 
upload the configuration information to ZooKeeper.  The newest index needs the 
UIMA jars.  Normally I would put them in the core's /lib directory, but since I 
am only accessing my server via ZooKeeper, I don't have that directory as an 
option.  

I know I could manually upload the jars onto the server, and then put some sort 
of path to them, but I'm hoping to manage all uploading of core specific 
configurations (and jars) via ZooKeeper.  I'm wondering if I am missing 
something in this new ZooKeeper enabled world?   Just for fun, I'm going to 
try and put the ~ 2 MB worth of Jars inside my /conf/ directory and then upload 
through ZooKeeper to see what happens.

Eric


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Apache Solr 3 Enterprise Search Server available from 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: Custom Jars for a config in the Solr Cloud world..

2012-08-14 Thread Eric Pugh

And I can now confirm that yes, ZooKeeper blows up when I attempted to add all 
the UIMA and content extraction jars to my conf/ directory in ZooKeeper!  A 
couple small jars did upload, and then it started sending back 
java.io.IOException: Broken pipe errors.

So any thoughts on the best way to manage Jars that seem like they should be 
part of your config?  Small jars I think will work, and maybe I just need to 
tweak my lib/ definitions in my solrconfig.xml to look for all the places 
that Jars may exist, even though on my local box it's different then on my 
integration Solr box.  Just seems  a bit messy ;-)



Eric

On Aug 14, 2012, at 4:40 PM, Jack Krupansky wrote:

 Dear Eric The Brave,
 
 As per the wiki:znodes are limited to the amount of data that they can have. 
 ZooKeeper was designed to store coordination data: status information, 
 configuration, location information, etc. This kind of meta-information is 
 usually measured in kilobytes, if not bytes. ZooKeeper has a built-in sanity 
 check of 1M, to prevent it from being used as a large data store, but in 
 general it is used to store much smaller pieces of data.
 
 See:
 https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription
 
 Also:jute.maxbuffer: (Java system property: jute.maxbuffer)
 This option can only be set as a Java system property. There is no zookeeper 
 prefix on it. It specifies the maximum size of the data that can be stored in 
 a znode. The default is 0xf, or just under 1M. If this option is changed, 
 the system property must be set on all servers and clients otherwise problems 
 will arise. This is really a sanity check. ZooKeeper is designed to store 
 data on the order of kilobytes in size.
 
 See:
 http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
 
 -- Jack Krupansky
 
 -Original Message- From: Eric Pugh
 Sent: Tuesday, August 14, 2012 4:11 PM
 To: solr-user@lucene.apache.org
 Subject: Custom Jars for a config in the Solr Cloud world..
 
 I've got a Solr instance with a number of cores that are each configured by 
 upload the configuration information to ZooKeeper.  The newest index needs 
 the UIMA jars.  Normally I would put them in the core's /lib directory, but 
 since I am only accessing my server via ZooKeeper, I don't have that 
 directory as an option.
 
 I know I could manually upload the jars onto the server, and then put some 
 sort of path to them, but I'm hoping to manage all uploading of core specific 
 configurations (and jars) via ZooKeeper.  I'm wondering if I am missing 
 something in this new ZooKeeper enabled world?   Just for fun, I'm going 
 to try and put the ~ 2 MB worth of Jars inside my /conf/ directory and then 
 upload through ZooKeeper to see what happens.
 
 Eric
 
 
 -
 Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
 http://www.opensourceconnections.com
 Co-Author: Apache Solr 3 Enterprise Search Server available from 
 http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 This e-mail and all contents, including attachments, is considered to be 
 Company Confidential unless explicitly stated otherwise, regardless of 
 whether attachments are marked as such.
 
 
 
 
 
 
 
 
 
 

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Apache Solr 3 Enterprise Search Server available from 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Staggering Replication start times

2012-03-20 Thread Eric Pugh

I am playing with an index that is sharded many times, between 64 and 128.  One 
thing I noticed is that with replication set to happen every 5 minutes, it 
means that each slave hits the master at the same moment asking for updates:  
:00:00, :05:00, :10:00, :15:00 etc.   Replication takes very little time, so it 
seems like I may be flooding the network with a bunch of traffic requests, and 
then goes away.

I tweaked the replication start time code to instead just start 5 minutes after 
a shard starts up, which means instead of all of the slaves hitting at the same 
moment, they are a bit staggered.   :00:00, :00:01, :00:02, :00:04 etcetera.   
Which presumably will use my network pipe more efficiently.  

Any thoughts on this?  I know it means the slaves are more likely to be 
slightly out of sync, but over a 5 minute range will get back in sync.  

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Apache Solr 3 Enterprise Search Server available from 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Eric Pugh

Depending on the project, I either pull from ASF Mirrors or Source.  However, I 
do reference Maven repository when writing Java code that is built by Maven.  
And it's often a pain getting it to work!

On Jan 18, 2011, at 4:23 PM, Ryan Aylward wrote:

 [X] ASF Mirrors (linked in our release announcements or via the Lucene
 website)
 
 [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
 
 [X] I/we build them from source via an SVN/Git checkout.
 
 [] Other (someone in your company mirrors them internally or via a
 downstream project)
 
 

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Re: What is the maximum number of documents that can be indexed ?

2010-10-14 Thread Eric Pugh

I would recommend looking at the work the HathiTrust has done.  They have 
published some really great blog articles about the work they have done in 
scaling Solr, and have put in huge amounts of data.   

The good news is that there isn't a exact number, because It depends.   The 
bad news is that there isn't an exact number because it depends!

Eric



On Oct 13, 2010, at 8:58 PM, Otis Gospodnetic wrote:

 Marco (use solr-u...@lucene list to follow up, please),
 
 There are no precise answers to such questions.  Solr can keep indexing.  The 
 limit is, I think, the available disk space.  I've never pushed Solr or 
 Lucene 
 to the point where Lucene index segments would become a serious pain, but 
 even 
 that can be controlled.  Same thing with number of open files, large file 
 support, etc.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 From: Marco Ciaramella ciaramellama...@gmail.com
 To: d...@lucene.apache.org
 Sent: Wed, October 13, 2010 6:19:15 PM
 Subject: What is the maximum number of documents that can be indexed ?
 
 Hi all,
 I am working on a performance specification document on a Solr/Lucene-based 
 application; this document is intended for the final customer. My question 
 is: 
 what is the maximum number of document I can index assuming 10 or 20kbytes 
 for 
 each document? 
 
 
 I could not find a precise answer to this question, and I tend to consider 
 that 
 Solr index can be virtually limited only by the JVM, the Operating System 
 (limits to large file support), or by hardware constraints (mainly RAM, etc. 
 ... 
 ). 
 
 
 Thanks
 Marco
 
 
 

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Re: Many Tomcat Processes on Server ?!?!?

2010-06-02 Thread Eric Pugh

My guess would be that commons-daemon is somehow thinking that Tomcat has gone 
down and started up multiple copies...   You only need one Tomcat process for 
your 4 core Solr instance!   You may have many other WAR applications hosted in 
Tomcat, I know a lot of places would have 1 tomcat per deployed WAR pattern.


On Jun 2, 2010, at 9:59 AM, stockii wrote:

 
 Hello.
 
 Our Server is a 8-Core Server with 12 GB RAM.  
 Solr is running with 4 Cores. 
 
 55 Tomcat 5.5 processes are running. ist this normal ??? 
 
 htop show me a list of these processes of the server. and tomcat have about
 55. 
 every process using:
 /usr/share/java/commons-daemon.jar:/usr/share/tomcat5.5/bin/bootstrap.jar.
 
 is this normal ? 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Many-Tomcat-Processes-on-Server-tp864732p864732.html
 Sent from the Solr - User mailing list archive at Nabble.com.

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Re: RIA sample and minimal JARs required to embed Solr

2010-06-02 Thread Eric Pugh

Glad to hear someone looking at Solr not just as web enabled search engine, but 
as a simpler/more powerful interface to Lucene!   

When you download the source code, look at the Chapter 8 Crawler project, 
specifically Indexer.java, it demonstrates how to index into both a 
traditional separate Solr process and how to fire up an embedded Solr.   It is 
remarkably easy to interact with an embedded Solr!   In terms of minimal 
dependencies, what you need for a standalone Solr (outside of the servlet 
container like Tomcat/Jetty) is what you need for an embedded Solr.

Eric

On May 29, 2010, at 9:32 PM, Thomas J. Buhr wrote:

 Solr,
 
 The Solr 1.4 EES book arrived yesterday and I'm very much enjoying it. I was 
 glad to see that rich clients are one case for embedding Solr as this is 
 the case for my application. Multi Cores will also be important for my RIA.
 
 The book covers a lot and makes it clear that Solr has extensive abilities. 
 There is however no clean and simple sample of embedding Solr in a RIA in the 
 book, only a few alternate language usage samples. Is there a link to a Java 
 sample that simply embeds Solr for local indexing and searching using Multi 
 Cores?
 
 Also, what kind of memory footprint am I looking at for embedding Solr? What 
 are the minimal dependancies?
 
 Thom

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Re: Tomcat 5.5 Security Constraint

2010-03-26 Thread Eric Pugh

I've had the exact same frustration with Multicore and Solr...  You
need to explicitly layout each pattern with the corename in it.

On Fri, Mar 26, 2010 at 8:35 AM, stockii st...@shopgate.com wrote:

 Heya hey.

 i have little trouble with my tomcat and my security-constraint

 i have 4 cores, in these cores all should be protected via username and pwd,
 but not the select!

 my cores are so.

 .../solr/search/admin/
 .../solr/suggest/admin/
 .../solr/searchpg/admin/
 .../solr/suggestpg/admin/

 this is my security-constraint:

  security-constraint
    web-resource-collection
      web-resource-namesolr/web-resource-name
      url-pattern/solr/*/admin/*/url-pattern
      http-methodGET/http-method
      http-methodPOST/http-method
  /web-resource-collection

 login-config
    auth-methodBASIC/auth-method
    realm-name*/realm-name
  /login-config

 url-pattern/solr/*/admin/*/url-pattern

 only this should be closed.


 no url-pattern are working. only --   /*

 can any help me ? thx !!





 --
 View this message in context: 
 http://n3.nabble.com/Tomcat-5-5-Security-Constraint-tp676516p676516.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Updating FAQ for International Characters?

2010-03-11 Thread Eric Pugh

So I am using Sunspot to post over, which means an extra layer of  
indirection between mean and my XML!  I will look tomorrow.



On Mar 10, 2010, at 7:21 PM, Chris Hostetter wrote:



: Any time a character like that was index Solr through a unknown  
entity error.

: But if converted to #192; or Agrave; then everything works great.
:
: I tried out using Tomcat versus Jetty and got the same results.   
Before I edit


Uh, you mean like the characters in exampledocs/utf8-example.xml ?

it contains literale utf8 characters, and it works fine.

Based on your #192; comment I assume you are posting XML ... are  
you

sure you are using the utf8 charset?

-Hoss



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Updating FAQ for International Characters?

2010-03-10 Thread Eric Pugh


Hi all,

On the wiki page http://wiki.apache.org/solr/FAQ under the section  
Why don't International Characters Work? there are a number of  
options specified for dealing with a character like À (an A with a  
caret, the agrave character).


Any time a character like that was index Solr through a unknown entity  
error.  But if converted to #192; or Agrave; then everything works  
great.


I tried out using Tomcat versus Jetty and got the same results.   
Before I edit the FAQ, wanted to touch base that others haven't been  
able to fully index documents with characters like À.


Eric



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Re: Solr YUI autocomplete

2009-11-02 Thread Eric Pugh


It does, have you looked at
http://wiki.apache.org/solr/SolJSON?highlight=%28json%29#Using_Solr.27s_JSON_output_for_AJAX.
 
Also, in my book on Solr, there is an example, but using the jquery
autocomplete, which I think was answered earlier on the thread!  Hope that
helps.



ANKITBHATNAGAR wrote:
 
 
 Does Solr supports JSONP (JSON with Padding) in the response?
 
 -Ankit
  
 
 
 -Original Message-
 From: Ankit Bhatnagar [mailto:abhatna...@vantage.com] 
 Sent: Friday, October 30, 2009 10:27 AM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr YUI autocomplete
 
 Hi Guys,
 
 I have question regarding - how to specify the 
 
 I am using YUI autocomplete widget and it expects the JSONP response.
 
 http://localhost:8983/solr/select/?q=monitorversion=2.2start=0rows=10indent=onwt=jsonjson.wrf=
 
 I am not sure how should I specify the json.wrf=function
 
 Thanks
 Ankit
 
 

-- 
View this message in context: 
http://old.nabble.com/JQuery-and-autosuggest-tp26130209p26157130.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 schedule?

2009-08-04 Thread Eric Pugh

Very soon I think is the answer.  As well as when its ready.  Solr
1.4 is waiting for the next release of Lucene, which is very soon.
Once Lucene comes out, Solr will follow in a week or two barring
release issues.

Also, if you look at JIRA:
http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12313351
you can see that there are 34 open issues still assigned to 1.4

Eric


On Tue, Aug 4, 2009 at 8:08 AM, Robert Youngr...@roryoung.co.uk wrote:
 Hi,
 When is Solr 1.4 scheduled for release? Is there any ballpark date yet?

 Thanks
 Rob

Re: Merging SOLR Documents

2009-07-03 Thread Eric Pugh

What you are talking about is federated search, and is beyond the  
scope of Solr.  However, maybe you can merge the two indexes into one  
index, and then distribute over multiple servers to get the  
performance you are looking for?


http://wiki.apache.org/solr/DistributedSearch

Eric

On Jul 3, 2009, at 7:24 AM, Amandeep Singh09 wrote:


Hi list,
I am new to this list and just starting solr. My question is how can  
we merge the results of two different searches. I mean if we have a  
function that has two threads so it has to go to two differen solr  
servers to get the result. Is there any way to merge the result  
using solr and solrj or dow we have to do it in java only?


Thanks

Amandeep Singh



 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION  
intended solely
for the use of the addressee(s). If you are not the intended  
recipient, please
notify the sender by e-mail and delete the original message.  
Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any  
other person and
any such actions are unlawful. This e-mail may contain viruses.  
Infosys has taken
every reasonable precaution to minimize this risk, but is not liable  
for any damage
you may sustain as a result of any virus in this e-mail. You should  
carry out your
own virus checks before opening the e-mail or attachment. Infosys  
reserves the
right to monitor and review the content of all messages sent to or  
from this e-mail
address. Messages sent to or from this e-mail address may be stored  
on the

Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Upgrade to solr 1.4

2009-06-26 Thread Eric Pugh

Solr in general is fairly stable in trunk.  That isn't to say that a  
critical error can't get through, because that does happen, but the  
test suite is pretty comprehensive.   With Solr 1.4 getting closer and  
closer, I think you'll see the pace of change dropping off.


I think it's one of those things that you have to judge for  
yourself..  Are the features/fixes/enhancements in 1.4 trunk worth a  
potential risk?  I assume that as part of deployment into production  
you have some sort of defined criteria that says Solr can be added?
Testing of server capacity/performance etc?  Those might tell you if  
there are any issues with Solr 1.4 trunk that would need to delay your  
deployment.


Eric


On Jun 26, 2009, at 10:58 AM, Julian Davchev wrote:


David Baker wrote:

Hi,

I need to upgrade from solr 1.3 to solr 1.4.  I was wondering if  
there

is a particular revision of 1.4 that I should use that is considered
very stable for a production environment?

Well it it's not pronounced stable and given in download page I don't
think you can rely on being very stable for production environment.


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: building custom RequestHandlers

2009-06-23 Thread Eric Pugh


Are you using the JavaScript interface to Solr?  
http://wiki.apache.org/solr/SolrJS

It may provide much of what you are looking for!

Eric

On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:


I am using solr and php quite nicely.
Currently the work flow includes some manipulation on php side so I
correctly format the query string and pass to tomcat/solr.
I somehow want to build own request handler in java so I skip the  
whole

apache/php request that is just for formating.
This will saves me tons of requests to apache since I use solr  
directly

from javascript.

Would like to ask if there is something ready that I can use and  
adjust.

I am kinda new in Java but once I get the pointers
I think should be able to pull out.
Thanks,
JD




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: building custom RequestHandlers

2009-06-23 Thread Eric Pugh

Like most things JavaScript, I found that I had to just dig through it  
and play with it.  However, the Reuters demo site was very easy to  
customize to interact with my own Solr instance, and I went from there.


On Jun 23, 2009, at 11:30 AM, Julian Davchev wrote:


Never used it.. I am just looking in docs how can I extend solr but no
luck so far :(
Hoping for some docs or real extend example.



Eric Pugh wrote:

Are you using the JavaScript interface to Solr?
http://wiki.apache.org/solr/SolrJS

It may provide much of what you are looking for!

Eric

On Jun 23, 2009, at 10:27 AM, Julian Davchev wrote:


I am using solr and php quite nicely.
Currently the work flow includes some manipulation on php side so I
correctly format the query string and pass to tomcat/solr.
I somehow want to build own request handler in java so I skip the  
whole

apache/php request that is just for formating.
This will saves me tons of requests to apache since I use solr  
directly

from javascript.

Would like to ask if there is something ready that I can use and  
adjust.

I am kinda new in Java but once I get the pointers
I think should be able to pull out.
Thanks,
JD




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal








-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Possible Containers

2009-06-15 Thread Eric Pugh

Can you highlight what problems you've had?  Solr doesn't have any  
really odd aspects about it that would prevent it from running in any  
kind of servlet  container.


Eric

On Jun 15, 2009, at 6:18 PM, John Martyniak wrote:

I have been using jetty and have been really happy with the ease of  
use and performance.


-John

On Jun 15, 2009, at 3:41 PM, Andrew Oliver wrote:


I've had it running in Jetty and Tomcat.

Tomcat 6 + JDK6 have some nice performance semantics especially with
non-blocking IO, persistent connections, etc.

It is likely that it will run in Resin, though I haven't tried it.

It will also likely run in any of the Tomcat-based stuff (i.e. TC
Server from Spring Source, JBossAS from Red Hat)


-Andy

On Mon, Jun 15, 2009 at 2:25 PM, Mukerjee, Neiloy
(Neil)neil.muker...@alcatel-lucent.com wrote:
Having tried Tomcat and not come to much success upon the  
realization that I'm using Tomcat 5.5 for other projects I'm  
working on and that I would be best off using Tomcat 6 for Solr  
v1.3.0, I am in search of another possible container. What have  
people used successfully that would be a good starting point for  
me to try out?




John Martyniak
President/CEO
Before Dawn Solutions, Inc.
9457 S. University Blvd #266
Highlands Ranch, CO 80126
o: 877-499-1562
c: 303-522-1756
e: j...@beforedawnsoutions.com
w: http://www.beforedawnsolutions.com



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: user feedback in solr

2009-06-10 Thread Eric Pugh

You can look at the HTTP server logs output by Jetty (or whatever  
server you have) that provides a lot of visibility into what people  
are looking for.   However, there isn't that I know of a ready to  
roll analytics package for Solr   It would be cool though!


Eric


On Jun 10, 2009, at 8:28 AM, Pooja Verlani wrote:


Hi all,

I wanted to know if there is any provision to accommodate user  
feedback in

the form of query logs and click logs,
to improve the search relevance and ranking.
Also, is there a possibility of it being included in the next  
version ?


Thank you,
Regards,
Pooja


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: How to disable posting updates from a remote server

2009-06-04 Thread Eric Pugh

Take a look at the security section in the wiki, u could do this with
firewall rules or password access.

On Thursday, June 4, 2009, ashokc ash...@qualcomm.com wrote:

 Hi,

 I find that I am freely able to post to my production SOLR server, from any
 other host that can run the post command. So somebody can wipe out the whole
 index by posting a delete query. Is there a way SOLR can be configured so
 that it will take updates ONLY from the server on which it is running?
 Thanks - ashok
 --
 View this message in context: 
 http://www.nabble.com/How-to-disable-posting-updates-from-a-remote-server-tp23876170p23876170.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to get number of optimizes

2009-06-01 Thread Eric Pugh


Not sure if it's simpler, but the JMX interface is more structured.

I think that just grabbing the page and parsing out the content with  
your favorite tool (Ruby  Hpricot) is pretty simple.


Eric

On Jun 1, 2009, at 1:17 PM, iamithink wrote:



Hello,

I'm looking for a simple way to automate (in a shell script) a  
request for
the number of times an index has been optimized (since the Solr  
webapp has
last started).  I know that this information is available on the  
Solr stats

page (http://host:port/solr/admin/stats.jsp) under Update
Handlers/stats/optimizes, but I'm looking for a simpler way than to  
retrieve
the page using wget or similar and parse the HTML.  More generally,  
is there
a convenient way to get at the other data presented on the Stats  
page?  I'm
currently using Solr 1.2 but will be migrating to 1.3 soon in case  
that

makes a difference.

Thanks...
--
View this message in context: 
http://www.nabble.com/How-to-get-number-of-optimizes-tp23818563p23818563.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Map tika attribute to be the id in Solr Cell

2009-05-28 Thread Eric Pugh


Hi all,

I want to use the Tika attribute stream_name as my unique key, which I  
can do if I specify uniqueKeystream_name/uniqueKey/ and run curl:


curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text 
\ext.capture=stream_name\ext.map.stream_name=stream_name  -F fi...@angeleyes.kar 



However, this means that I can't use the ext.metadata.prefix to  
capture the other metadata fields via:


curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text 
\ext.metadata.prefix=metadata_\ext.capture=stream_name 
\ext.map.stream_name=stream_name  -F fi...@angeleyes.kar


If I do, it seems like stream_name is lost becasue it is now  
metadata_stream_name, but I can't use that name in my ext.capture and  
ext.map:


curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text 
\ext.metadata.prefix=metadata_\ext.capture=metadata_stream_name 
\ext.map.metadata_stream_name=stream_name  -F fi...@angeleyes.kar


Any ideas?  Currently seems like an either/or, but I'd like both!

Eric


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Map tika attribute to be the id in Solr Cell

2009-05-28 Thread Eric Pugh

Grant, I went back and tried to recreate my bug using the example  
app.  And indexing example/site/tutorial.pdf I get the error with this  
command:


budapest:site epugh$  curl http://localhost:8983/solr/update/extract?ext.def.fl=text 
\ext.metadata.prefix=metadata_\ext.map.stream_name=id  -F fi...@tutorial.pdf 



If I remove the ext.metadata.prefix, then I am okay, but then I can't  
use dynamic fields for indexing metadata fields.   So this works, but  
I have to manually create all my fields:


budapest:site epugh$  curl http://localhost:8983/solr/update/extract?ext.def.fl=text 
\ext.map.stream_name=id  -F fi...@tutorial.pdf



Eric





On May 28, 2009, at 8:28 PM, Grant Ingersoll wrote:



On May 28, 2009, at 11:29 AM, Eric Pugh wrote:


Hi all,

I want to use the Tika attribute stream_name as my unique key,  
which I can do if I specify uniqueKeystream_name/uniqueKey/ and  
run curl:


curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text 
\ext.capture=stream_name\ext.map.stream_name=stream_name  -F fi...@angeleyes.kar 






Why do you need to have the ext.capture and why do you need to map  
stream_name to stream_name?  If the name in tika metadata is a field  
name, you don't need to map.


Also, I assume I'm missing something here because why can't you just  
pass in id=name of the stream since presumably, in your examples  
anyway, you have this info, right?  If not, I don't know where else  
you are getting it from, b/c it is a Solr thing, not a Tika thing.   
In fact, that reminds me, I should document those values that the  
ERH adds to the Metadata.


However, this means that I can't use the ext.metadata.prefix to  
capture the other metadata fields via:


curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text 
\ext.metadata.prefix=metadata_\ext.capture=stream_name 
\ext.map.stream_name=stream_name  -F fi...@angeleyes.kar


If I do, it seems like stream_name is lost becasue it is now  
metadata_stream_name, but I can't use that name in my ext.capture  
and ext.map:


curl http://localhost:8983/solr/karaoke/update/extract?ext.def.fl=text 
\ext.metadata.prefix=metadata_\ext.capture=metadata_stream_name 
\ext.map.metadata_stream_name=stream_name  -F fi...@angeleyes.kar


Any ideas?  Currently seems like an either/or, but I'd like both!

Eric


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467  
| http://www.opensourceconnections.com

Free/Busy: http://tinyurl.com/eric-cal






--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Map tika attribute to be the id in Solr Cell

2009-05-28 Thread Eric Pugh

Updating to latest and greatest added that data, thank you for the  
pointer.  Too many copies of Solr 1.4 trunk, and I'd neglected to  
update.


However, the issue with the mapping not working with the  
ext.metadata.prefix seems to remain:


budapest:site epugh$  curl http://localhost:8983/solr/update/extract?ext.def.fl=text 
\ext.map.stream_name=id\ext.metadata.prefix=metadata_  -F fi...@tutorial.pdf 



bodyh2HTTP ERROR: 500/ 
h2preorg.apache.solr.common.SolrException: Document [null] missing  
required field: id



Eric



On May 28, 2009, at 8:56 PM, Grant Ingersoll wrote:



On May 28, 2009, at 8:47 PM, Eric Pugh wrote:

Grant,  you are quite right!  I was too far down in the weeds, and  
didn't need to be doing all that crazyness.



And I don't actually see the metadata fields.  I would expect to  
however!


What revision are you running?

The following was added to ERH on 4/24/09, r768281, (see SOLR-1128)  
to solve this exact problem:

 String[] names = metadata.names();
 NamedList metadataNL = new NamedList();
 for (int i = 0; i  names.length; i++) {
   String[] vals = metadata.getValues(names[i]);
   metadataNL.add(names[i], vals);
 }
 rsp.add(stream.getName() + _metadata, metadataNL);




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: DIH uses == instead of = in SQL

2009-05-26 Thread Eric Pugh

Argh...   I learned a lesson (yet again!)...   I spent an hour setting  
up detailed logging, digging around in lots of DIH source, with no  
real luck finding the offending == versus =.   Mentioned my  
frustration to a colleague and he pointed out right where I had  
checked multiple times that I had typed in == versus = in my SQL  
statement!



Eric

On May 23, 2009, at 12:02 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



are you using delta-import w/o a deltaImportQuery ? pls paste the
relevant portion of data-config.xml

On Sat, May 23, 2009 at 12:13 AM, Eric Pugh
ep...@opensourceconnections.com wrote:

I am getting this error:

Caused by:  
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You
have an error in your SQL syntax; check the manual that corresponds  
to your
MySQL server version for the right syntax to use near '=='1433'' at  
line 1
   at  
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native

Method)

during a select for a specific institution:

org.apache.solr.handler.dataimport.DataImportHandlerException:  
Unable to

execute query: select institution_id, name, acronym as i_acronym from
institutions where institution_id=='1433' Processing Document # 1
   at
org.apache.solr.handler.dataimport.JdbcDataSource 
$ResultSetIterator.init(JdbcDataSource.java:248)

   at
org 
.apache 
.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java: 
205)

   at
org 
.apache 
.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java: 
38)

   at
org 
.apache 
.solr 
.handler 
.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)

   at
org 
.apache 
.solr 
.handler 
.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)


I just switched to using the paired deltaImportQuery and deltaQuery
approach.   I am using the latest from trunk.  Any ideas?

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal









--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

DIH uses == instead of = in SQL

2009-05-22 Thread Eric Pugh


I am getting this error:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:  
You have an error in your SQL syntax; check the manual that  
corresponds to your MySQL server version for the right syntax to use  
near '=='1433'' at line 1
at  
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)


during a select for a specific institution:

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable  
to execute query: select institution_id, name, acronym as i_acronym  
from institutions where institution_id=='1433' Processing Document # 1
at org.apache.solr.handler.dataimport.JdbcDataSource 
$ResultSetIterator.init(JdbcDataSource.java:248)
at  
org 
.apache 
.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205)
at  
org 
.apache 
.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at  
org 
.apache 
.solr 
.handler 
.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at  
org 
.apache 
.solr 
.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java: 
71)


I just switched to using the paired deltaImportQuery and deltaQuery  
approach.   I am using the latest from trunk.  Any ideas?


Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Cleanly shutting down Solr/Jetty on Windows

2009-05-20 Thread Eric Pugh

Wouldn't you want to run it as a windows service and use net start/  
net stop?   If you download and install Jetty it comes with the  
appropriate scripts to be installed as a service.


Eric



On May 20, 2009, at 12:39 PM, Chris Harris wrote:


I'm running Solr with the default Jetty setup on Windows. If I start
solr with java -jar start.jar from a command window, then I can
cleanly shut down Solr/Jetty by hitting Control-C. In particular, this
causes the shutdown hook to execute, which appears to be important.

However, I don't especially want to run Solr from a command window.
Instead, I want to launch it from a scheduled task, which
does the java -jar start.jar in a non-interactive way and which
does not bring up a command window. If I were on unix I could
use the kill command to send an appropriate signal to the JVM, but
I gather this doesn't work on Windows.

As such, what is the proper way to cleanly shut down Solr/Jetty on  
Windows,
if they are not running in a command window? The main way I know how  
to kill
Solr right now if it's running outside a command window is to go to  
the
Windows task manager and kill the java.exe process there. But this  
seems

to kill java immediately, so I'm doubtful that the shutdown hook is
getting executed.

I found a couple of threads through Google suggesting that Jetty now  
has a

stop.jar
script that's capable of stopping Jetty in a clean way across  
platforms.

Is this maybe the best option? If so, would it be possible to include
stop.jar in the Solr example/ directory?


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Shutting down an instance of EmbeddedSolrServer

2009-05-20 Thread Eric Pugh


I created ticket SOLR-1178 for the small tweak.

https://issues.apache.org/jira/browse/SOLR-1178

Eric

On May 5, 2009, at 12:26 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



hi Eric,
there should be a getter for CoreContainer in EmbeddedSolrServer.  
Open an issue

--Noble

On Tue, May 5, 2009 at 12:17 AM, Eric Pugh
ep...@opensourceconnections.com wrote:

Hi all,

I notice that when I use EmbeddedSolrServer I have to use Control C  
to stop

the process.  I think the way to shut it down is by calling

coreContainer.shutdown().

However, is it possible to get the coreContainer from a SolrServer  
object?
 Right now it is defined as protected final CoreContainer  
coreContainer;.


I wanted to do:

((EmbeddedSolrServer)solr)getCoreContainer.shutdown();

But is seem I need to keep my own reference to the coreContainer?

Is changing this worth a patch?

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal









--
--Noble Paul


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

XPath query support in Solr Cell

2009-05-20 Thread Eric Pugh

So I am trying to filter down what I am indexing, and the basic XPath  
queries don't work.  For example, working with tutorial.pdf this  
indexes all the div/:


curl http://localhost:8983/solr/update/extract?ext.idx.attr=true 
\ext.def.fl=text\ext.map.div=foo_t\ext.capture=div 
\ext.literal.id=126\ext.xpath=\/xhtml:html\/xhtml:body\/ 
descendant:node\(\)  -F tutori...@tutorial.pdf


However, if I want to only index the first div, I expect to do this:

budapest:site epugh$ curl http://localhost:8983/solr/update/extract?ext.idx.attr=true 
\ext.def.fl=text\ext.map.div=foo_t\ext.capture=div 
\ext.literal.id=126\ext.xpath=\/xhtml:html\/xhtml:body\/ 
xhtml:div[1]  -F tutori...@tutorial.pdf


But I keep getting back an issue from curl.  My attempts to escape the  
[1] have failed.  Any suggestions?


curl: (3) [globbing] error: bad range specification after pos 174

Eric

PS,
Also, this site seems to be okay as a place to upload your html and  
practice xpath:


http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm

I did have to trip out the namespace stuff though.




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Solr vs Sphinx

2009-05-15 Thread Eric Pugh

Something that would be interesting is to share solr configs for  
various types of indexing tasks.  From a solr configuration aimed at  
indexing web pages to one doing large amounts of text to one that  
indexes specific structured data.  I could see those being posted on  
the wiki and helping folks who say I want to do X, is there an  
example?.


I think most folks start with the example Solr install and tweak from  
there, which probably isn't the best path...


Eric

On May 15, 2009, at 8:09 AM, Mark Miller wrote:


In the spirit of good defaults:

I think we should change the Solr highlighter to highlight phrase  
queries by default, as well as prefix,range,wildcard constantscore  
queries. Its awkward to have to tell people you have to turn those  
on. I'd certainly prefer to have to turn them off if I have some  
limitation rather than on.


- Mark


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread Eric Pugh

CommonsHttpSolrServer is how you access Solr from a Java client via  
HTTP.  You can connect to a Solr running anywhere   
EmbeddedSolrServer starts up Solr internally, and connects directly,  
all in a single JVM...  Embedded may be faster, the jury is out, but  
you have to have your Solr server and your Solr client on the same  
box...   Unless you really need it, I would start with  
CommonsHttpSolrServer, it's easier to configure and get going with and  
more flexible.


Eric


On May 14, 2009, at 1:30 PM, sachin78 wrote:



What is the difference between EmbeddedSolrServer and  
CommonsHttpSolrServer.

Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use  
CommonsHttpSolrServer.


Can anyone please guide me the right path/way.So that i pick the right
implementation.

Thanks in advance.

--Sachin
--
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: StatsComponent and 1.3

2009-05-08 Thread Eric Pugh

I'm guessing that manipulating the client end, acts_as_solr, is an  
easier approach then backporting server side functionality.   
Especially as you will have to forward migrate at some point.


Out of curiosity, which version of acts_as_solr are you using?  The  
plugin has moved homes a couple of times, and I have heard and found  
that the version by Mathias Meyer at http://github.com/mattmatt/acts_as_solr/tree/master 
 is the best.  I've used it with 1.4 trunk with no issues, and  
Mathias has been very responsive.


Eric


On May 7, 2009, at 10:25 PM, David Shettler wrote:


Foreword:  I'm not a java developer :)

OSVDB.org and datalossdb.org make use of solr pretty extensively via
acts_as_solr.

I found myself with a real need for some of the StatsComponent stuff
(mainly the sum feature), so I pulled down a nightly build and played
with it.  StatsComponent proved perfect, but... the nightly build
output seems to be different, and thus incompatible with acts_as_solr.

Now, I realize this is more or less an acts_as_solr issue, but...

Is it possible, with some degree of effort (obviously) for me to
essentially port some of the functionality of StatsComponent to 1.3
myself?  It's that, or waiting for 1.4 to come out and someone
developing support for it into acts_as_solr, or myself fixing what I
have for acts_as_solr to work with the output.  I'm just trying to
gauge the easiest solution :)

Any feedback or suggestions would be grand.

Thanks,

Dave
Open Security Foundation


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: How to index the documents in Apache Solr

2009-05-06 Thread Eric Pugh

I would also recommend starting with the out of the box Jetty   
Otherwise you are both trying to learn the basics of Solr and how to  
stand it up in Tomcat.  It's not hard, but learn Solr basics first,  
then move to more advanced topics.


Eric

On May 6, 2009, at 9:57 AM, Erik Hatcher wrote:



On May 6, 2009, at 5:11 AM, uday kumar maddigatla wrote:



The link which shows the things in Jetty. But i'm using Tomcat.
hi,

If i run the command which is given in the link, it is tryinge to  
post the
indexes at port number 8983. But in my case my tomcat is running on  
8080.


Where to change the port.



~/dev/solr/example/exampledocs: java -jar post.jar -help
SimplePostTool: version 1.2
This is a simple command line tool for POSTing raw XML to a Solr
port.  XML data can be read from files specified as commandline
args; as raw commandline arg strings; or via STDIN.
Examples:
 java -Ddata=files -jar post.jar *.xml
 java -Ddata=args  -jar post.jar 'deleteid42/id/delete'
 java -Ddata=stdin -jar post.jar  hd.xml
Other options controlled by System Properties include the Solr
URL to POST to, and whether a commit should be executed.  These
are the defaults for all System Properties...
 -Ddata=files
 -Durl=http://localhost:8983/solr/update
 -Dcommit=yes


Erik



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Shutting down an instance of EmbeddedSolrServer

2009-05-04 Thread Eric Pugh


Hi all,

I notice that when I use EmbeddedSolrServer I have to use Control C to  
stop the process.  I think the way to shut it down is by calling


coreContainer.shutdown().

However, is it possible to get the coreContainer from a SolrServer  
object?  Right now it is defined as protected final CoreContainer  
coreContainer;.


I wanted to do:

((EmbeddedSolrServer)solr)getCoreContainer.shutdown();

But is seem I need to keep my own reference to the coreContainer?

Is changing this worth a patch?

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: How to submit code improvement/suggestions

2009-04-30 Thread Eric Pugh


Yup!

One thing though is that if you see some big changes you want to make,  
you should probably join the solr-dev list and broach the topic there  
first to make sure you are headed on the right path.  The committers  
typically don't want to introduce change for change's sake, but  
cleanup and better code docs is always welcome on open source  
projects, and a great way to learn the code and the community.


Eric



On Apr 30, 2009, at 1:27 PM, Amit Nithian wrote:

My apologies if this sounds like a silly question but for this  
project, how

do I go about submitting code suggestions/improvements? They aren't
necessarily bugs as such but rather just cleaning up some perceived
strangeness (or even suggesting a package change). Would I need to  
create a

JIRA ticket and submit a patch?
Thanks
Amit


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Is solr right for this scenario?

2009-04-24 Thread Eric Pugh

It seems like you have three components to your system:

1) Data indexing from multiple sources

2) Search for specific words in documents

3) Preserve rating and search term.

I think that Solr comes into play on #1 and #2. You can index content
in any number of approaches, either via the new DataImportHandler
architecture, or the more traditional write a loader script that puts
the documents in Solr. You can store in Solr when a document was
indexed, and use that to check against the original documents to see
if they changed. Check a last published tag on an RSS feed, or the
last updated time on a physical file. This is a very common use case
for Solr.

For #2, you could have users issue queries, and make them favorites,
storing them in the DB. Assuming they like the results they mark the
documents with the ratings, which you could store in Solr, but I would
put in a DB.. Easier to manage User A says 1, User B says 0.

Then for the UI, just issue the search baseed on queries stored in the
db, and match the id's up with the ranking in the DB. Simple!

As far as the last part, Solr works best in filesystem, that is part
of why it is so fast, no clunky SQL. There are scripts for backing up
and restoring indexes that you can use, check the wiki http://wiki.apache.org/solr/SolrOperationsTools
.

Eric

On Apr 24, 2009, at 6:18 AM, Developer In London wrote:

Hi All,

I am new to the whole Solr/Lucene community. But I think this might
be the
solution ot what I am looking to do. I would appreciate any feedback
on how

I can go about doing this with Solr:

I am looking to make a system where -
a) mainly lots of different blog sites, web journals, articles are
indexed
on a regular basis. Data that has already been indexed needs to be
revisited

to see if there are any changes.
b) The end users has very fixed search terms, eg 'Lloyds TSB' and
'Corporate
Banking'. All the documents that are found matching this are
presented to a

human to analyse.
c) Once the human analyses the document he gives it a rating of 1, 0
or -1.
This rating needs to be saved somewhere and be linked with the
specific

document and also with the search term (eg 'Lloyds TSB' 'Corporate
Banking' in this case).
d) End users can then see these documents with the ratings next to
them.

What would be the best approach to this?

Should I set up a different database to save the rating and relevant
mappings, or is there any way to put it in to Solr?

My 2nd question is, can Solr Index be saved in a database in any
way? Whats

the backup and recovery method on Solr?

Thanks in advance.

Nayeem

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Is solr right for this scenario?

2009-04-24 Thread Eric Pugh



On Apr 24, 2009, at 7:54 AM, Developer In London wrote:


Thanks for the fast reply. Wow this seems a very active community.

I have a few more questions in that case:

1) If Solr is going to be file-based, os it then preferable to run  
multiple

Solrs with Shards? How can I determine what capacity 1 Solr can cope?
It depends!  Solr can manage up to X records easily in a single index,  
however your milage may vary.  One of the nice things about Solr is it  
is very scalable, and offers you many options.   I would go with the  
most simple setup for Solr for now, and then as your development  
progresses, and you load data then investigate sharding etc.  Solr,  
properly managed, won't be your bottleneck, it's be your data loading  
scripts or elsewhere.



2) I am presuming there is already tokenizers for hypertext and xml  
in Solr

so that it can use extract the right information out?
There are a number of different options available out there for  
indexing content.



3) I need to also get the 'author' information out for things like  
blogs. I

guess theres no universal way of doing it and I have to have someone
manually go through the documents and feed the solr index with the  
author

information?
Your loading script will be bespoke to your situation, however any  
competent developer can put together scripts to load from your varous  
data sources.



When you mention 'write a loader script...', do you mean I should
incorporate the date checking in the loader script? Solr has no  
internal way

of checking the timestamp in a document and updating?
Solr makes no assumptions about your data sources, it isn't a document  
management system, it is just a search engine.  Well, that isn't  
totally true, the new DataImportHandler architecture does allow you to  
preserve some information about when did I last run an update, what  
has been updated since, however it's pretty new stuff.



Eric






Thanks,

Nayeem

2009/4/24 Eric Pugh ep...@opensourceconnections.com


It seems like you have three components to your system:

1) Data indexing from multiple sources

2) Search for specific words in documents

3) Preserve rating and search term.

I think that Solr comes into play on #1 and #2.  You can index  
content in
any number of approaches, either via the new DataImportHandler  
architecture,
or the more traditional write a loader script that puts the  
documents in
Solr.  You can store in Solr when a document was indexed, and use  
that to
check against the original documents to see if they changed.  Check  
a last
published tag on an RSS feed, or the last updated time on a  
physical file.

This is a very common use case for Solr.

For #2, you could have users issue queries, and make them  
favorites,

storing them in the DB.  Assuming they like the results they mark the
documents with the ratings, which you could store in Solr, but I  
would put

in a DB..  Easier to manage User A says 1, User B says 0.

Then for the UI, just issue the search baseed on queries stored in  
the db,

and match the id's up with the ranking in the DB.  Simple!

As far as the last part, Solr works best in filesystem, that is  
part of why

it is so fast, no clunky SQL.  There are scripts for backing up and
restoring indexes that you can use, check the wiki
http://wiki.apache.org/solr/SolrOperationsTools.

Eric




On Apr 24, 2009, at 6:18 AM, Developer In London wrote:

Hi All,


I am new to the whole Solr/Lucene community. But I think this  
might be the
solution ot what I am looking to do. I would appreciate any  
feedback on

how
I can go about doing this with Solr:

I am looking to make a system where -
a) mainly lots of different blog sites, web journals, articles are  
indexed

on a regular basis. Data that has already been indexed needs to be
revisited
to see if there are any changes.
b) The end users has very fixed search terms, eg 'Lloyds TSB' and
'Corporate
Banking'. All the documents that are found matching this are  
presented to

a
human to analyse.
c) Once the human analyses the document he gives it a rating of 1,  
0 or

-1.
This rating needs to be saved somewhere and be linked with the  
specific

document and also with the search term (eg 'Lloyds TSB'  'Corporate
Banking' in this case).
d) End users can then see these documents with the ratings next to  
them.


What would be the best approach to this?

Should I set up a different database to save the rating and relevant
mappings, or is there any way to put it in to Solr?

My 2nd question is, can Solr Index be saved in a database in any  
way?

Whats
the backup and recovery method on Solr?

Thanks in advance.

Nayeem



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal








--
cashflowclublondon.co.uk

 (`-''-/).___..--''`-._
  `6_ 6

Re: Adding text document

2009-03-30 Thread Eric Pugh


I would work through this tutorial and then ask specific questions: 
http://lucene.apache.org/solr/tutorial.html

Alternatively there are some commercial support options: 
http://wiki.apache.org/solr/Support

Eric

On Mar 30, 2009, at 6:36 PM, nga pham wrote:


Hi All,

I am new to Solr.  Can you please tell me, how can I add a text  
document?


Thank you,
Nga


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Best way to unit test solr integration

2009-03-27 Thread Eric Pugh

So my first thought is that unit test + solr integration is an  
oxymoron.  In the sense that unit test implies the smallest functional  
unit, and solr integration implies multiple units working together.


It sounds like you have two different tasks.  the code that generate  
queies, you can test that without Solr.  If you need to parse some  
sort of solr document to generate a query based on it, then mock up  
the query.   A lot of folks will just use Solr to build a result set,  
and then save that on the filesystem.  my_big_result1.xml and read  
it in and feed it to your code.


On the other hand, for you code testing indexing and retrieval, again,  
if you can use the same approach to decouple what solr does from your  
code.  Unless you've patched Solr, you shouldn't need to unit test  
Solr, Solr has very nice unit testing built in.


On the other hand, if you are doing integration testing, where you  
want a more end to end view of your application, then you probably  
already have a test solr setup in your environment somewhere that  
you can rely on to use.


Spinning up and shutting down Solr for tests can be done, and I can  
think of use cases for why you might want to do it, but it does incur  
a penalty of being more work.  And you still need to validate that  
your embedded/unit test solr works the same as your integration/test  
environment Solr.


Eric



On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote:


Hello,

On our project, we have quite a bit of code used to generate Solr  
queries, and I need to create some unit tests to ensure that these  
continue to work.  In addition, I need to generate some unit tests  
that will test indexing and retrieval of certain documents, based on  
our current schema and the application logic that generates the  
indexable documents as well as generates the Solr queries.


My question is - what's the best way for me to unit test our Solr  
integration?


I'd like to be able to spin up an embedded/in-memory solr, or that  
failing just start one up as part of my test case setup, fill it  
with interesting documents, and do some queries, comparing the  
results to expected results.


Are there wiki pages or other documented examples of doing this?  It  
seems rather straight-forward, but who knows, it may be dead simple  
with some unknown feature.


Thanks!
-Joe


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Best way to unit test solr integration

2009-03-27 Thread Eric Pugh

So in the building block story you talked about, that sounds like an  
integration (functional?  user acceptance?) test..   And I would treat  
Solr the same way you treat your database that you are storing model  
objects in.


If in your tests you bring up a fresh version of the db, populate it  
with tables etc, put in sample data, then you should do the same with  
Solr.  My guess is that you have a test database running, and  
therefore you need a live supported test Solr.  And the same  
processes you use so that two functional tests don't step on each  
others data in the database can be applied to Solr!


You can think of tweaking solr config changes as similar to tweaking  
indexes in your db..  Both require Configuration Management to track  
those changes, ensure they are deployed, and don't regress anything.


Let us know how you get on!

Eric


On Mar 27, 2009, at 12:50 PM, Joe Pollard wrote:

Thanks for the tips, I like the suggestion of testing the document  
and query generation without having solr involved.  That seems like  
a more bite-sized unit; I think I'll do that.


However, here's the test case that I'm considering where I'd like to  
have a live solr instance:


During an exercise of optimizing our schema, I'm going to be making  
wholesale changes that I'd like to ensure don't break some portion  
of our app.  It seems like a good method for this would be to write  
a test with the following steps: (arguably not a unit test, but a  
very valuable test indeed in our application)
* take some defined model object generated at test time, store it in  
db

* run it through our document creation code
* submit it into solr
* generate a query using our custom criteria-based generation code
* ensure that the query returns the results as expected
* flesh out the new model objects from the db using only the id  
fields returned from Solr
* In the end, it would be expected to have model objects retrieved  
from the db that match model objects at the beginning of the test.


These building blocks could be stacked in numerous ways to test  
almost all the different scenarios in which we use Solr.


Also, when/if we start making solr config changes, I can ensure that  
they change nothing from my app's functional point of view (with the  
exception of ridding us of dreaded OOMs).


Thanks,
-Joe

-Original Message-
From: Eric Pugh [mailto:ep...@opensourceconnections.com]
Sent: Friday, March 27, 2009 11:27 AM
To: solr-user@lucene.apache.org
Subject: Re: Best way to unit test solr integration

So my first thought is that unit test + solr integration is an
oxymoron.  In the sense that unit test implies the smallest functional
unit, and solr integration implies multiple units working together.

It sounds like you have two different tasks.  the code that generate
queies, you can test that without Solr.  If you need to parse some
sort of solr document to generate a query based on it, then mock up
the query.   A lot of folks will just use Solr to build a result set,
and then save that on the filesystem.  my_big_result1.xml and read
it in and feed it to your code.

On the other hand, for you code testing indexing and retrieval, again,
if you can use the same approach to decouple what solr does from your
code.  Unless you've patched Solr, you shouldn't need to unit test
Solr, Solr has very nice unit testing built in.

On the other hand, if you are doing integration testing, where you
want a more end to end view of your application, then you probably
already have a test solr setup in your environment somewhere that
you can rely on to use.

Spinning up and shutting down Solr for tests can be done, and I can
think of use cases for why you might want to do it, but it does incur
a penalty of being more work.  And you still need to validate that
your embedded/unit test solr works the same as your integration/test
environment Solr.

Eric



On Mar 27, 2009, at 11:59 AM, Joe Pollard wrote:


Hello,

On our project, we have quite a bit of code used to generate Solr
queries, and I need to create some unit tests to ensure that these
continue to work.  In addition, I need to generate some unit tests
that will test indexing and retrieval of certain documents, based on
our current schema and the application logic that generates the
indexable documents as well as generates the Solr queries.

My question is - what's the best way for me to unit test our Solr
integration?

I'd like to be able to spin up an embedded/in-memory solr, or that
failing just start one up as part of my test case setup, fill it
with interesting documents, and do some queries, comparing the
results to expected results.

Are there wiki pages or other documented examples of doing this?  It
seems rather straight-forward, but who knows, it may be dead simple
with some unknown feature.

Thanks!
-Joe


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http

Re: How do I accomplish this (semi-)complicated setup?

2009-03-25 Thread Eric Pugh

You could index the user name or ID, and then in your application add  
as filter the username as you pass the query back to Solr.  Maybe have  
a access_type that is Public or Private, and then for public searches  
only include the ones that meet the access_type of Public.


Eric


On Mar 25, 2009, at 12:52 PM, Jesper Nøhr wrote:


Hi list,

I've finally settled on Solr, seeing as it has almost everything I
could want out of the box.

My setup is a complicated one. It will serve as the search backend on
Bitbucket.org, a mercurial hosting site. We have literally thousands
of code repositories, as well as users and other data. All this needs
to be indexed.

The complication comes in when we have private repositories. Only
select users have access to these, but we still need to index them.

How would I go about accomplishing this? I can't think of a clean  
way to do it.


Any pointers much appreciated.


Jesper


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Indexing the directory

2009-03-17 Thread Eric Pugh


Victor,

I'd recommend look at the tutorial at http://lucene.apache.org/solr/tutorial.html 
 and using the list for more specific questions.  Also, there a list  
of companies (as well as mine!) that do support of Solr at http://wiki.apache.org/solr/Support 
 that eTrade can contract with to provide indepth support.


Eric Pugh

On Mar 16, 2009, at 6:25 PM, Huang, Zijian(Victor) wrote:




Hi, all:
   I am new to SOLR, can anyone please tell me what do I do to index
a some text files in a local directory?

Thanks

Victor




-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Is wiki page still accurate

2009-03-12 Thread Eric Pugh


Folks,

Is this section title Full Import Example on http://wiki.apache.org/solr/DataImportHandler 
 still accurate?  The steps referring to the example-solr-home.jar  
and the SOLR-469 patch seem out of date with where 1.4 is today?


Seems like the example-DIH stuff is simpler/more direct example???

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Organizing POJO's in a heirarchy in Solr

2009-03-11 Thread Eric Pugh

Solr really isn't organized for tree structures of data.  I think you  
might do better using a database with a tree structure.


pojo would be a table of pojo's serialized out.  And the parent_id  
could point to another structure that builds the tree.  Can you flesh  
out your use case more of why they need to be in a tree structure.


Eric



On Mar 11, 2009, at 8:29 AM, PKJ wrote:




Is there anyone who have any idea solve this issue?
Please give your thoughts.

Regards,
Praveen


PKJ wrote:


Hi Eric,

Thanks for your response.
Yes you are right! Am trying to place POJOs into Solr directly and  
this is

working fine.
I want to search them based on the object properties, need to  
organize

them in a heirarchy but not by package names.

Something like:
/Repository
 |
 |_ Folder1
 |
 |_ POJO 1

It must store the object in this hierarchy. I might be asking which  
is not

at all supported by Solr.
Please give your valuable inputs.

Regards,
Praveen


Eric Pugh-4 wrote:


Are you trying to Java objects in Solr in order to be searchable?   
How

about just dumping them as text using POJO -- to text formats such
as JSON or Betwixt (http://commons.apache.org/betwixt/).

Then you can just search on the package structure...

?q=com.abc.lucene.* to return everything under that structure?

Eric


On Mar 10, 2009, at 7:13 AM, Praveen_Kumar_J wrote:



Someone please throw some light on this post.
Thanks in advance.


Praveen_Kumar_J wrote:


Hi

I just upload simple POJOs into Solr by creating custom types and
dynamic
fields in Solr schema as shown below,

...
fieldType name=TestType class=com.abc.lucene.TestType
sortMissingLast=true omitNorms=true/

dynamicField name=*_i_i_s_m  type=integerindexed=true
stored=true multiValued=true/
dynamicField name=*_i_i_s_nm  type=integerindexed=true
stored=true multiValued=false/
dynamicField name=*_i_i_ns_m  type=integerindexed=true
stored=false multiValued=true/

But I need to organize these POJOs in a hierarchy which can be
navigated
easily (something like explorer).
Am not sure whether this feature is supported by Solr. But still
planning
to implement it somehow (With the help of DB).

/Root
 |
 |_ POJO Type1
 | |
 | |_POJO Type1_1
 |
 |_POJO Type2
   |
   |_POJO Type2_1

I need to organize the POJOs as shown above.
Is there any way to achieve this requirement??

Regards,
Praveen



--
View this message in context:
http://www.nabble.com/Organizing-POJO%27s-in-a-heirarchy-in-Solr-tp22427900p22432121.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal











--
View this message in context: 
http://www.nabble.com/Organizing-POJO%27s-in-a-heirarchy-in-Solr-tp22427900p22454101.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Organizing POJO's in a heirarchy in Solr

2009-03-10 Thread Eric Pugh

Are you trying to Java objects in Solr in order to be searchable?  How  
about just dumping them as text using POJO -- to text formats such  
as JSON or Betwixt (http://commons.apache.org/betwixt/).


Then you can just search on the package structure...

?q=com.abc.lucene.* to return everything under that structure?

Eric


On Mar 10, 2009, at 7:13 AM, Praveen_Kumar_J wrote:



Someone please throw some light on this post.
Thanks in advance.


Praveen_Kumar_J wrote:


Hi

I just upload simple POJOs into Solr by creating custom types and  
dynamic

fields in Solr schema as shown below,

...
fieldType name=TestType class=com.abc.lucene.TestType
sortMissingLast=true omitNorms=true/

dynamicField name=*_i_i_s_m  type=integerindexed=true
stored=true multiValued=true/
dynamicField name=*_i_i_s_nm  type=integerindexed=true
stored=true multiValued=false/
dynamicField name=*_i_i_ns_m  type=integerindexed=true
stored=false multiValued=true/

But I need to organize these POJOs in a hierarchy which can be  
navigated

easily (something like explorer).
Am not sure whether this feature is supported by Solr. But still  
planning

to implement it somehow (With the help of DB).

/Root
  |
  |_ POJO Type1
  | |
  | |_POJO Type1_1
  |
  |_POJO Type2
|
|_POJO Type2_1

I need to organize the POJOs as shown above.
Is there any way to achieve this requirement??

Regards,
Praveen



--
View this message in context: 
http://www.nabble.com/Organizing-POJO%27s-in-a-heirarchy-in-Solr-tp22427900p22432121.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

55 matches

Mail list logo