[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058241#comment-13058241
 ] 

Dawid Weiss commented on SOLR-2628:
---

I've talked about it a little bit with Bernd and indeed, it seems possible to 
reduce the size of in-memory data structures by an order of magnitude (or even 
two orders of magnitude, we shall see). I'm on vacation for the next week and 
on a business trip for another one after that, but I'll be on it once I come 
back home.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058468#comment-13058468
 ] 

Michael McCandless commented on SOLR-2628:
--

Dawid, have a look at LUCENE-3233 -- we have a [very very rough] start at this.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058470#comment-13058470
 ] 

Dawid Weiss commented on SOLR-2628:
---

Yep, this is a duplicate. Thanks Mike. Like I said -- I won't be able to work 
on this for the next two weeks (I also have that FST refactoring opened up in 
the background... it's progressing slowly), but it's definitely a low-hanging 
fruit to pick because it shouldn't be very difficult and the gains huge.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058475#comment-13058475
 ] 

Michael McCandless commented on SOLR-2628:
--

I think the reduction of RAM should be huge but lookup speed might be slower 
(ie the usual tradeoff of FST), since we are going char by char in the FST.  If 
we go word-by-word (ie FST's labels are word ords and we separately resolve 
word - ord via normal hash lookup) then that might be a good middle 
ground... but this is all speculation for now!


 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058489#comment-13058489
 ] 

Dawid Weiss commented on SOLR-2628:
---

Yes, this may be the case. It'd need to be investigated because storing words 
in a hashtable will also bump memory requirements, whereas an FST can at least 
reuse some prefixes and suffixes.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org