date:20100804

On Wed, Aug 4, 2010 at 3:27 PM, Stanislaw solrgeschic...@googlemail.comwrote:

 Hi all!
 I cant load my custom queries from the external file, as written here:
 https://issues.apache.org/jira/browse/SOLR-784

 This option is seems to be not implemented in current version 1.4.1 of
 Solr.
 It was deleted or it comes first with new version?


That patch was never committed so it is not available in any release.

-- 
Regards,
Shalin Shekhar Mangar.

Re: analysis tool vs. reality

On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir rcm...@gmail.com wrote:

 I think I agree with Justin here, I think the way analysis tool highlights
 'matches' is extremely misleading, especially considering it completely
 ignores queryparsing.

 it would be better if it put your text in a memoryindex and actually parsed
 the query w/ queryparser, ran it, and used the highlighter to try to show
 any matches.


+1

-- 
Regards,
Shalin Shekhar Mangar.

Re: Is there a better for solor server side loadbalance?

2010/8/4 Chengyang atreey...@163.com

 The default solr solution is client side loadbalance.
 Is there a solution provide the server side loadbalance?


No. Most of us stick a HTTP load balancer in front of multiple Solr servers.

-- 
Regards,
Shalin Shekhar Mangar.

DIH and Cassandra

2010-08-04 Thread Mark

Is it possible to use DIH with Cassandra either out of the box or with 
something more custom? Thanks

Re: enhancing auto complete

2010-08-04 Thread Avlesh Singh

I preferred to answer this question privately earlier. But I have received
innumerable requests to unveil the architecture. For the benefit of all, I
am posting it here (after hiding as much info as I should, in my company's
interest).

The context: Auto-suggest feature on http://askme.in

*Solr setup*: Underneath are some of the salient features -

   1. TermsComponent is NOT used.
   2. The index is made up of 4 fields of the following types -
   autocomplete_full, autocomplete_token, string and text.
   3. autocomplete_full uses KeywordTokenizerFactory and
   EdgeNGramFilterFactory. autocomplete_token uses WhitespaceTokenizerFactory
   and EdgeNGramFilterFactory. Both of these are Solr text fields with standard
   filters like LowerCaseFilterFactory etc applied during querying and
   indexing.
   4. Standard DataImportHandler and a bunch of sql procedures are used to
   derive all suggestable phrases from the system and index them in the above
   mentioned fields.

*Controller setup*: The controller (to handle suggest queries) is a typical
JAVA servlet using Solr as its backend (connecting via solrj). Based on the
incoming query string, a lucene query is created. It is BooleanQuery
comprising of TermQuery across all the above mentioned fields. The boost
factor to each of these term queries would determine (to an extent) what
kind of matches do you prefer to show up first. JSON is used as the data
exchange format.

*Frontend setup*: It is a home grown JS to address some specific use cases
of the project in question. One simple exercise with Firebug will spill all
the beans. However, I strongly recommend using jQuery to build (and extend)
the UI component.

Any help beyond this is available, but off the list.

Cheers
Avlesh
@avlesh http://twitter.com/avlesh | http://webklipper.com

On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar 
bhavnik.gaj...@gatewaynintec.com wrote:

  Whoops!

 table still not looks ok :(

 trying to send once again


 loremLorem ipsum dolor sit amet
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 lorem ipLorem ipsum dolor sit amet
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 lorem ipsltest xyz lorem ipslili

 On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:

 Avlesh,

 Thanks for responding

 The table mentioned below looks like,

 lorem   Lorem ipsum dolor sit amet
  Hieyed ddi lorem ipsum
 dolor
  test lorem ipsume
  test xyz lorem ipslili

 lorem ip   Lorem ipsum dolor sit amet
  Hieyed ddi lorem ipsum
 dolor
  test lorem ipsume
  test xyz lorem ipslili

 lorem ipsl test xyz lorem ipslili


 Yes, [http://askme.in] looks good!

 I would like to know its designs/solr configurations etc.. Can you
 please provide me detailed views of it?

 In [http://askme.in], there is one thing to be noted. Search text like,
 [business c] populates [Business Centre] which looks OK but, [Consultant
 Business] looks bit odd. But, in general the pointer you suggested is
 great to start with.

 On 8/2/2010 8:39 PM, Avlesh Singh wrote:


  From whatever I could read in your broken table of sample use cases, I think


  you are looking for something similar to what has been done here 
 -http://askme.in; if this is what you are looking do let me know.

 Cheers
 Avlesh
 @avleshhttp://twitter.com/avlesh http://twitter.com/avlesh  | 
 http://webklipper.com

 On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik 
 Gajjarbhavnik.gaj...@gatewaynintec.com  wrote:




  Hi,

 I'm looking for a solution related to auto complete feature for one
 application.

 Below is a list of texts from which auto complete results would be
 populated.

 Lorem ipsum dolor sit amet
 tincidunt ut laoreet
 dolore eu feugiat nulla facilisis at vero eros et
 te feugait nulla facilisi
 Claritas est etiam processus
 anteposuerit litterarum formas humanitatis
 fiant sollemnes in futurum
 Hieyed ddi lorem ipsum dolor
 test lorem ipsume
 test xyz lorem ipslili

 Consider below table. First column describes user entered value and
 second column describes expected result (list of auto complete terms
 that should be populated from Solr)

 lorem
 *Lorem* ipsum dolor sit amet
 Hieyed ddi *lorem* ipsum dolor
 test *lorem *ipsume
 test xyz *lorem *ipslili
 lorem ip
 *Lorem ip*sum dolor sit amet
 Hieyed ddi *lorem ip*sum dolor
 test *lorem ip*sume
 test xyz *lorem ip*slili
 lorem ipsl
 test xyz *lorem ipsl*ili



 Can anyone share ideas of how this can be achieved

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420


Thanks man i haven't tried this but where do put that xml configuration. Is
it to the web.xml in solr?

Cheers,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread jayendra patil

The sole home is configured in the web.xml of the application which points
to the folder having the conf files and the data directory
env-entry
   env-entry-namesolr/home/env-entry-name
   env-entry-valueD:/multicore/env-entry-value
   env-entry-typejava.lang.String/env-entry-type
/env-entry

Regards,
Jayendra

On Wed, Aug 4, 2010 at 12:21 PM, Hando420 hando...@gmail.com wrote:


 Thanks man i haven't tried this but where do put that xml configuration. Is
 it to the web.xml in solr?

 Cheers,
 Hando
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023188.html
 Sent from the Solr - User mailing list archive at Nabble.com.

can't use strdist as functionquery?

2010-08-04 Thread solr-user


I want to sort my results by how closely a given resultset field matches a
given string.

For example, say I am searching for a given product, and the product can be
found in many cities including seattle.  I want to sort the results so
that results from city of seattle are at the top, and all other results
below that

I thought that I could do so by using strdist as a functionquery (I am using
solr 1.4 so I cant directly sort on strdist) but am having problems with the
syntax of the query because functionqueries require double quotes and so
does strdist.

My current query, which fails with an NPE, looks something like this:

http://localhost:8080/solr/select?q=(product:foo)
_val_:strdist(seattle,city,edit)sort=score%20ascfl=product, city,
score

I have tried various types of URL encoding (ie using %22 instead of double
quotes in the strdist function), but no success.

Any ideas??  Is there a better way to accomplish this sorting??

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Setting up apache solr in eclipse with Tomcat

2010-08-04 Thread Hando420


Thanks now its clear and works fine.

Regards,
Hando
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Setting-up-apache-solr-in-eclipse-with-Tomcat-tp1021673p1023404.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sharing index files between multiple JVMs and replication

2010-08-04 Thread Kelly Taylor

Is anybody else encountering these same issues; IF having a similar setup?  And 
is there a way to configure certain Solr web-apps as read-only (basically dummy 
instances) so that index changes are not allowed?

- Original Message 
From: Kelly Taylor wired...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 5:48:11 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Yes, they are on a common file server, and I've been sharing the same index 
directory between the Solr JVMs. But I seem to be hitting a wall when 
attempting 

to use just one instance for changing the index.

With Solr replication disabled, I stream updates to the one instance, and this 
process hangs whenever there are additional Solr JVMs started up with the same 
configuration in solrconfig.xml  -  So I then tried, to no avail, using a 
different configuration, solrconfig-readonly.xml where the updateHandler was 
commmented out, all /update* requestHandlers removed, mainIndex locktype of 
none, etc.

And with Solr replication enabled, the Slave seems to hang, or at least report 
unusually long time estimates for the current running replication process to 
complete. 

-Kelly

- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, August 3, 2010 4:56:58 PM
Subject: Re: Sharing index files between multiple JVMs and replication

Are these files on a common file server? If you want to share them
that way, it actually does work just to give them all the same index
directory, as long as only one of them changes it.

On Tue, Aug 3, 2010 at 4:38 PM, Kelly Taylor wired...@yahoo.com wrote:
 Is there a way to share index files amongst my multiple Solr web-apps, by
 configuring only one of the JVMs as an indexer, and the remaining, as 
read-only
 searchers?

 I'd like to configure in such a way that on startup of the read-only 
searchers,
 missing cores/indexes are not created, and updates are not handled.

 If I can get around the files being locked by the read-only instances, I 
should
 be able to scale wider in a given environment, as well as have less replicated
 copies of my master index (Solr 1.4 Java Replication).

 Then once the commit is issued to the slave, I can fire off a RELOAD script 
for
 each of my read-only cores.

 -Kelly

-- 
Lance Norskog
goks...@gmail.com

Re: analysis tool vs. reality

2010-08-04 Thread Chris Hostetter


: I think I agree with Justin here, I think the way analysis tool highlights
: 'matches' is extremely misleading, especially considering it completely
: ignores queryparsing.

it really only attempts to identify when there is overlap between 
analaysis at query time and at indexing time so you can easily spot when 
one analyzer or the other breaks things so that they no longer line up 
(or when it fiexes things so they start to line up)

Even if we eliminated that highlighting as missleading, people would still 
do it in thier minds, it would just be harder -- it doesn't change the 
underlying fact that analysis is only part of the picture.

: it would be better if it put your text in a memoryindex and actually parsed
: the query w/ queryparser, ran it, and used the highlighter to try to show
: any matches.

Thta level of query explanation really only works if the user gives us a 
full document (all fields, not just one) and a full query string, and all 
of the possible query params -- because the query parser (either implicit 
because of config, or explicitly specified by the user) might change it's 
behavior based on those other params.

I agree with you: debugging functionality along hte lines of what you are 
describing would be *VASTLY* more useful then what we've got right now, 
and is something i breifly looked into doing before as an extension of the 
existing DebugComponent...

   https://issues.apache.org/jira/browse/SOLR-1749

...the problems i encountered trying to do it as a debug component on 
a real Solr request seem like they would also be problems for a 
MemoryIndex based admin tool approach like what you suggest -- but if 
you've got ideas on working arround them i am 100% interested.

Independent of how we might create a better QueryPasrser + Analyssis 
Explanation tool / debug component is hte question of what we can do to 
make it more clear what exactly the analysis.jsp page is doing and what 
people can infer from that page.  As i said, i don't think removing the 
match highlighting will actaully reduce confusion, but perhaps there is 
verbage/disclaimers that could be added to make it more clear?



-Hoss

Re: analysis tool vs. reality

2010-08-04 Thread Robert Muir

Furthermore, I would like to add its not just the highlight matches
functionality that is horribly broken here, but the output of the analysis
itself is misleading.

lets say i take 'textTight' from the example, and add the following synonym:

this is broken = broke

the query time analysis is wrong, as it clearly shows synonymfilter
collapsing this is broken to broke, but in reality with the qp for that
field, you are gonna get 3 separate tokenstreams and this will never
actually happen (because the qp will divide it up on whitespace first)

So really the output from 'Query Analyzer' is completely bogus.

On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir rcm...@gmail.com wrote:



 On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 it really only attempts to identify when there is overlap between
 analaysis at query time and at indexing time so you can easily spot when
 one analyzer or the other breaks things so that they no longer line up
 (or when it fiexes things so they start to line up)


 It attempts badly, because it only works in the most trivial of cases
 (e.g. doesnt reflect the interaction of queryparser with multiword synonyms
 or worddelimiterfilter).

 Since Solr includes these non-trivial analysis components *in the example*
 it means that this 'highlight matches' doesnt actually even really work at
 all.

 Someone is gonna use this thing when they dont understand why analysis isnt
 doing what they want, i.e. the cases like I outlined above.

 For the trivial cases where it does work the 'highlight matches' isnt
 useful anyway, so in its current state its completely unnecessary.


 Even if we eliminated that highlighting as missleading, people would still
 do it in thier minds, it would just be harder -- it doesn't change the
 underlying fact that analysis is only part of the picture.


 I'm not suggesting that. I'm suggesting fixing the highlighting so its not
 misleading. There are really only two choices:
 1. remove the current highlighting
 2. fix it.

 in its current state its completely useless and misleading, except for very
 trivial cases, in which you dont need it anyway.



 : it would be better if it put your text in a memoryindex and actually
 parsed
 : the query w/ queryparser, ran it, and used the highlighter to try to
 show
 : any matches.

 Thta level of query explanation really only works if the user gives us a
 full document (all fields, not just one) and a full query string, and all
 of the possible query params -- because the query parser (either implicit
 because of config, or explicitly specified by the user) might change it's
 behavior based on those other params.


 thats true, but I dont see why the user couldnt be allowed to provide just
 this.
 I'd bet money a lot of people are using this thing with a specific
 query/document in mind anyway!


 people can infer from that page.  As i said, i don't think removing the
 match highlighting will actaully reduce confusion, but perhaps there is
 verbage/disclaimers that could be added to make it more clear?


  As i said before, I think i disagree with you. I think for stuff like this
 the technicals are less important, whats important is this is a misleading
 checkbox that really confuses users.

 I suggest disabling it entirely, you are only going to remove confusion.


 --
 Robert Muir
 rcm...@gmail.com




-- 
Robert Muir
rcm...@gmail.com

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler


Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're looking  
for

imo.

http://wiki.apache.org/solr/FieldCollapsing


Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I could  
get (using just top two, versus top 10 for simplicity) results that  
looked like


dog training (faceted field value A)
super dog (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for dog AND facet field:C,  
followed by dog AND facet field:D.


Used field collapsing would improve the probability that if I asked  
for the top 100 hits, I'd find entries for each of my top N faceted  
field values.


Thanks again,

-- Ken

I've got a situation where the key result from an initial search  
request
(let's say for dog) is the list of values from a faceted field,  
sorted by

hit count.

For the top 10 of these faceted field values, I need to get the top  
hit for
the target request (dog) restricted to that value for the faceted  
field.


Currently this is 11 total requests, of which the 10 requests  
following the
initial query can be made in parallel. But that's still a lot of  
requests.


So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of custom
request handler?

Thanks,

-- Ken



Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu

DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote:
 Is it possible to use DIH with Cassandra either out of the box or with
 something more custom? Thanks




-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Re: DIH and Cassandra

2010-08-04 Thread Andrei Savu

DIH only works with relational databases and XML files [1], you need
to write custom code in order to index data from Cassandra.

It should be pretty easy to map documents from Cassandra to Solr.
There are a lot of client libraries available [2] for Cassandra.

[1] http://wiki.apache.org/solr/DataImportHandler
[2] http://wiki.apache.org/cassandra/ClientOptions

On Wed, Aug 4, 2010 at 6:41 PM, Mark static.void@gmail.com wrote:
 Is it possible to use DIH with Cassandra either out of the box or with
 something more custom? Thanks


-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Andrei Savu

Check this article [1] that explains how to setup haproxy to do load
balacing. The steps are the same even if you are not using Drupal.  By
using this approach you can easily add more replicas without changing
the application configuration files.

You should also check SolrCloud [2] which does automatic load
balancing and fail-over for queries. This branch is still under
development.

[1] 
http://davehall.com.au/blog/dave/2010/03/13/solr-replication-load-balancing-haproxy-and-drupal
[2] http://wiki.apache.org/solr/SolrCloud

2010/8/4 Chengyang atreey...@163.com:
- Hide quoted text -
 The default solr solution is client side loadbalance.
 Is there a solution provide the server side loadbalance?



-- 
Indekspot -- http://www.indekspot.com -- Managed Hosting for Apache Solr

Solrj ContentStreamUpdateRequest Slow

2010-08-04 Thread Tod

I'm running a slight variation of the example code referenced below and 
it takes a real long time to finally execute.  In fact it hangs for a 
long time at solr.request(up) before finally executing.  Is there 
anything I can look at or tweak to improve performance?


I am also indexing a local pdf file, there are no firewall issues, solr 
is running on the same machine, and I tried the actual host name in 
addition to localhost but nothing helps.



Thanks - Tod

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Geert-Jan Brits

If I understand correctly: you want to sort your collapsed results by 'nr of
collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page. Perhaps best
is to check the jira-issues to make sure this isn't already available now,
but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the comments
someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

hope that helps,
Geert-jan

2010/8/4 Ken Krugler kkrugler_li...@transpac.com

 Hi Geert-Jan,


 On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

  Field Collapsing (currently as patch) is exactly what you're looking for
 imo.

 http://wiki.apache.org/solr/FieldCollapsing


 Thanks for the ref, good stuff.

 I think it's close, but if I understand this correctly, then I could get
 (using just top two, versus top 10 for simplicity) results that looked like

 dog training (faceted field value A)
 super dog (faceted field value B)

 but if the actual faceted field value/hit counts were:

 C (10)
 D (8)
 A (2)
 B (1)

 Then what I'd want is the top hit for dog AND facet field:C, followed by
 dog AND facet field:D.

 Used field collapsing would improve the probability that if I asked for the
 top 100 hits, I'd find entries for each of my top N faceted field values.

 Thanks again,

 -- Ken


  I've got a situation where the key result from an initial search request
 (let's say for dog) is the list of values from a faceted field, sorted
 by
 hit count.

 For the top 10 of these faceted field values, I need to get the top hit
 for
 the target request (dog) restricted to that value for the faceted
 field.

 Currently this is 11 total requests, of which the 10 requests following
 the
 initial query can be made in parallel. But that's still a lot of
 requests.

 So my questions are:

 1. Is there any magic query to handle this with Solr as-is?

 2. if not, is the best solution to create my own request handler?

 3. And in that case, any input/tips on developing this type of custom
 request handler?

 Thanks,

 -- Ken


 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g

Re: Best solution to avoiding multiple query requests

2010-08-04 Thread Ken Krugler

Hi Geert-jan,

On Aug 4, 2010, at 12:04pm, Geert-Jan Brits wrote:

If I understand correctly: you want to sort your collapsed results
by 'nr of

collapsed results'/ hits.

It seems this can't be done out-of-the-box using this patch (I'm not
entirely sure, at least it doesn't follow from the wiki-page.
Perhaps best
is to check the jira-issues to make sure this isn't already
available now,

but just not updated on the wiki)

Also I found a blogpost (from the patch creator afaik) with in the
comments

someone with the same issue + some pointers.
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/

Yup, that's the one -
http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/comment-page-1/#comment-1249

So with some modifications to that patch, it could work...thanks for
the info!

-- Ken

2010/8/4 Ken Krugler kkrugler_li...@transpac.com

Hi Geert-Jan,

On Aug 4, 2010, at 5:30am, Geert-Jan Brits wrote:

Field Collapsing (currently as patch) is exactly what you're
looking for

imo.

http://wiki.apache.org/solr/FieldCollapsing

Thanks for the ref, good stuff.

I think it's close, but if I understand this correctly, then I
could get
(using just top two, versus top 10 for simplicity) results that
looked like

dog training (faceted field value A)
super dog (faceted field value B)

but if the actual faceted field value/hit counts were:

C (10)
D (8)
A (2)
B (1)

Then what I'd want is the top hit for dog AND facet field:C,
followed by

dog AND facet field:D.

Used field collapsing would improve the probability that if I asked
for the
top 100 hits, I'd find entries for each of my top N faceted field
values.

Thanks again,

-- Ken

I've got a situation where the key result from an initial search
request
(let's say for dog) is the list of values from a faceted field,
sorted

by
hit count.

For the top 10 of these faceted field values, I need to get the
top hit

for
the target request (dog) restricted to that value for the faceted
field.

Currently this is 11 total requests, of which the 10 requests
following

the
initial query can be made in parallel. But that's still a lot of
requests.

So my questions are:

1. Is there any magic query to handle this with Solr as-is?

2. if not, is the best solution to create my own request handler?

3. And in that case, any input/tips on developing this type of
custom

request handler?

Thanks,

-- Ken

Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g

Indexing boolean value


Im trying to index a boolean location, but for some reason it does not show
up in my indexed data.

data-config.xml

entity name=location query=select * from locations
field name=id column=ID /
field name=title column=TITLE /
field name=city column=CITY /
field name=official column=OFFICIALLOCATION /

OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml

field name=official type=boolean indexed=true stored=true/
copyField source=official dest=text /

(im not sure why I would use copyField, also tried it without that line,
but still without luck)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

Your schema.xml setting for the field is probably tokenizing the punctuation. 
Change the field type to one that doesn't tokenize on punctuation; e.g. use 
text_ws and not text

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:36 PM
To: solr-user@lucene.apache.org
Subject: Indexing fieldvalues with dashes and spaces


Im having issues with indexing field values containing spaces and dashes.
For example: Im trying to index province names of the Netherlands. Some 
province names contain a -:
Zuid-Holland
Noord-Holland

my data-config has this:
entity name=location_province query=select provinceid from 
locations where id=${location.id}
entity name=provinces query=select title from provinces 
where id = ${location_province.provinceid}
field name=province column=title  /
/entity
/entity


When I check what has been indexed, I have this:
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
−
lst name=params
str name=indenton/str
str name=start0/str
str name=q*:*/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
−
result name=response numFound=3 start=0 − doc str 
name=cityNijmegen/str − arr name=features strTuin/str 
strCafe/str /arr str name=id1/str str 
name=provinceGelderland/str − arr name=services 
strFotoreportage/str /arr − arr name=theme strGemeentehuis/str 
/arr date name=timestamp2010-08-04T19:11:51.796Z/date
str name=titleGemeentehuis Nijmegen/str /doc − doc str 
name=cityUtrecht/str − arr name=features strTuin/str 
strCafe/str strDanszaal/str /arr str name=id2/str str 
name=provinceUtrecht/str − arr name=services strFotoreportage/str 
strExclusieve huur/str /arr − arr name=theme strGemeentehuis/str 
/arr date name=timestamp2010-08-04T19:11:51.796Z/date
str name=titleGemeentehuis Utrecht/str /doc − doc str 
name=cityBloemendaal/str − arr name=features strStrand/str 
strCafe/str strDanszaal/str /arr str name=id3/str str 
name=provinceZuid-Holland/str
−
arr name=services
strExclusieve huur/str
strLive muziek/str
/arr
−
arr name=theme
strStrand  Zee/str
/arr
date name=timestamp2010-08-04T19:11:51.812Z/date
str name=titleBeachclub Vroeger/str /doc /result /response



So we see that the full field has been indexed:
str name=provinceZuid-Holland/str


BUT, when I check the facets via
http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=id,title,city,score,features,official,servicesfacet=truefacet.field=themefacet.field=featuresfacet.field=provincefacet.field=services

I get this (snippet):
facet_counts:{
  facet_queries:{},
  facet_fields:{
theme:[
 Gemeentehuis,2,
 ,1,    a
 Strand,1,
 Zee,1],
features:[
 cafe,3,
 danszaal,2,
 tuin,2,
 strand,1],
province:[
 gelderland,1,
 holland,1,
 utrecht,1,
 zuid,1,   b
 zuidholland,1],
services:[
 exclusiev,2,
 fotoreportag,2,   c
 huur,2,
 live,1,    d
 muziek,1]},


There several weird things happen which I have indicated with ===

a. the full field value is Strand  Zee, but now one facet is 
b. the full field value is Zuid-Holland, but now zuid is a separate facet 
c. the full field value is fotoreportage, but somehow the last character has 
been truncated d. the full field value live muziek, but now live and 
muziek have become separate facets

What can I do about this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023699.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing boolean value

I could be wrong, but I thought bit was an integer. Try changing fieldtype to 
integer.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 3:42 PM
To: solr-user@lucene.apache.org
Subject: Indexing boolean value


Im trying to index a boolean location, but for some reason it does not show up 
in my indexed data.

data-config.xml

entity name=location query=select * from locations
field name=id column=ID /
field name=title column=TITLE /
field name=city column=CITY /
field name=official column=OFFICIALLOCATION /

OFFICIALLOCATION is a MSSQL database field of type 'bit'


schema.xml

field name=official type=boolean indexed=true stored=true/ copyField 
source=official dest=text /

(im not sure why I would use copyField, also tried it without that line, but 
still without luck)
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023708.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

   field name=city type=text_ws indexed=true stored=true/
   field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
   field name=features type=text_ws indexed=true stored=true
multiValued=true/
   field name=services type=text_ws indexed=true stored=true
multiValued=true/
   field name=province type=text_ws indexed=true stored=true/

It has now become:

 facet_counts:{
  facet_queries:{},
  facet_fields:{
theme:[
 Gemeentehuis,2,
 ,1,    still  is created as separate facet
 Strand,1,
 Zee,1],
features:[
 Cafe,3,
 Danszaal,2,
 Tuin,2,
 Strand,1],
province:[
 Gelderland,1,
 Utrecht,1,
 Zuid-Holland,1],  this is now correct
services:[
 Exclusieve,2,
 Fotoreportage,2,
 huur,2,
 Live,1,  Live muziek is split and separate facets are 
created
 muziek,1]},
  facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma

You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  field name=city type=text_ws indexed=true stored=true/
  field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
  field name=features type=text_ws indexed=true stored=true
multiValued=true/
  field name=services type=text_ws indexed=true stored=true
multiValued=true/
  field name=province type=text_ws indexed=true stored=true/

It has now become:

facet_counts:{
 facet_queries:{},
 facet_fields:{
theme:[
Gemeentehuis,2,
,1,    still  is created as separate facet
Strand,1,
Zee,1],
features:[
Cafe,3,
Danszaal,2,
Tuin,2,
Strand,1],
province:[
Gelderland,1,
Utrecht,1,
Zuid-Holland,1],  this is now correct
services:[
Exclusieve,2,
Fotoreportage,2,
huur,2,
Live,1,  Live muziek is split and separate facets are created
muziek,1]},
 facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing boolean value


Hi,

I tried that already, so that would make this:

field name=official type=integer indexed=true stored=true/
copyField source=official dest=text /

(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed
in the documents:
http://localhost:8983/solr/db/select/?q=*%3A*version=2.2start=0rows=10indent=on


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by it will mess with your results? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

Echoing Markus - use the tokenized field to return results, but have a 
duplicate field of fieldtype=string to show the untokenized results. E.g. 
facet on that field.

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@buyways.nl] 
Sent: Wednesday, August 04, 2010 4:18 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing fieldvalues with dashes and spaces

You shouldn't fetch faceting results from analyzed fields, it will mess with 
your results. Search on analyzed fields but don't retrieve values from them. 
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:15
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


I changed  values to text_ws 

Now I only seem to have problems with fieldvalues that hold spacessee
below:

  field name=city type=text_ws indexed=true stored=true/
  field name=theme type=text_ws indexed=true stored=true
multiValued=true omitNorms=true termVectors=true /
  field name=features type=text_ws indexed=true stored=true
multiValued=true/
  field name=services type=text_ws indexed=true stored=true
multiValued=true/
  field name=province type=text_ws indexed=true stored=true/

It has now become:

facet_counts:{
 facet_queries:{},
 facet_fields:{
theme:[
Gemeentehuis,2,
,1,    still  is created as separate facet
Strand,1,
Zee,1],
features:[
Cafe,3,
Danszaal,2,
Tuin,2,
Strand,1],
province:[
Gelderland,1,
Utrecht,1,
Zuid-Holland,1],  this is now correct
services:[
Exclusieve,2,
Fotoreportage,2,
huur,2,
Live,1,  Live muziek is split and separate facets are created
muziek,1]},
 facet_dates:{}}}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing boolean value

Copyfield copies the field so you can have multiple versions. Useful to dump 
all fields into one super field you can search on, for perf reasons.

If the column isn't being indexed, I'd suggest the problem is in DIH. No 
suggestions as to why, I'm afraid.

-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Wednesday, August 04, 2010 4:22 PM
To: solr-user@lucene.apache.org
Subject: RE: Indexing boolean value

Hi,

I tried that already, so that would make this:

field name=official type=integer indexed=true stored=true/ copyField 
source=official dest=text /

(still not sure what copyField does though)

But even that wont work. I also dont see the officallocation columns indexed in 
the documents:
http://localhost:8983/solr/db/select/?q=*%3A*version=2.2start=0rows=10indent=on

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-boolean-value-tp1023708p1023811.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Indexing fieldvalues with dashes and spaces

2010-08-04 Thread Markus Jelsma

Hmm, you should first read a bit more on schema design on the wiki and learn 
about indexing and querying Solr.

 

The copyField directive is what is commonly used in a faceted navigation 
system, search on analyzed fields, show faceting results using the primitive 
string field type. With copyField, you can, well, copy the field from one to 
another without it being analyzed by the first - so no chaining is possible, 
which is good. 

 

Let's say you have a city field you want to navigate with, but also search in, 
then you would have an analyzed field for search and a string field for 
displaying the navigation.

 

But, check the wiki on this subject.
 
-Original message-
From: PeterKerk vettepa...@hotmail.com
Sent: Wed 04-08-2010 22:23
To: solr-user@lucene.apache.org; 
Subject: RE: Indexing fieldvalues with dashes and spaces


Sorry, but Im a newbie to Solr...how would I change my schema.xml to match
your requirements?

And what do you mean by it will mess with your results? What will happen
then?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH and Cassandra

On Wed, Aug 4, 2010 at 9:11 PM, Mark static.void@gmail.com wrote:

 Is it possible to use DIH with Cassandra either out of the box or with
 something more custom? Thanks


It will take some modifications but DIH is built to create denormalized
documents so it is possible.

Also see https://issues.apache.org/jira/browse/SOLR-853

-- 
Regards,
Shalin Shekhar Mangar.

RE: Indexing fieldvalues with dashes and spaces