Re: Detecting repeated subsequences of identical items

2016-04-20 Thread Vlastimil Brom
2016-04-21 5:07 GMT+02:00 Steven D'Aprano :
> I want to group repeated items in a sequence. For example, I can group
> repeated sequences of a single item at a time using groupby:
>
>
> from itertools import groupby
> for key, group in groupby("BBCDDEEE"):
> group = list(group)
> print(key, "count =", len(group))
>
>
> outputs:
>
> A count = 4
> B count = 2
> C count = 1
> D count = 2
> E count = 3
> F count = 4
>
>
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>
> giving counts:
>
> (ABC) count = 2
> (ABCDE) count = 2
> F count = 1
> (ABC) count = 3
> B repeats 1 time
>
>
> or:
>
> ABC ABC ABC D E A B C D E F ABC ABC ABC B
>
> giving counts:
>
> (ABC) count = 3
> D count = 1
> E count = 1
> A count = 1
> B count = 1
> C count = 1
> D count = 1
> E count = 1
> F count = 1
> (ABC) count = 3
> B count = 1
>
>
>
> How can I do this? Does this problem have a standard name and/or solution?
>
>
>
>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
if I am not missing something, the latter form of grouping might be
achieved with the following regex:

t="ABCABCABCDEABCDEFABCABCABCB"
grouped = re.findall(r"((?:(\w+?)\2+)|\w+?)", t)
print(grouped)
for grp, subseq in grouped:
if subseq:
print(subseq, grp.count(subseq))
else:
print(grp, "1")


the printed output is:

[('ABCABCABC', 'ABC'), ('D', ''), ('E', ''), ('A', ''), ('B', ''),
('C', ''), ('D', ''), ('E', ''), ('F', ''), ('ABCABCABC', 'ABC'),
('B', '')]
ABC 3
D 1
E 1
A 1
B 1
C 1
D 1
E 1
F 1
ABC 3
B 1

The former one seems to be more tricky...

hth,
   vbr
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Detecting repeated subsequences of identical items

2016-04-20 Thread Michael Selik
On Thu, Apr 21, 2016 at 2:35 AM Michael Selik 
wrote:

> On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano 
> wrote:
>
>> I want to group [repeated] subsequences. For example, I have:
>> "ABCABCABCDEABCDEFABCABCABCB"
>> and I want to group it into repeating subsequences. I can see two
>> ways... How can I do this? Does this problem have a standard name and/or
>> solution?
>>
>
> I'm not aware of a standard name. This sounds like an unsupervised
> learning problem. There's no objectively correct answer unless you add more
> specificity to the problem statement.
>
> Regexes may sound tempting at first, but because a repeating subsequence
> may have nested repeating subsequences and this can go on infinitely, I
> think we at least need a push-down automata.
>
> I checked out some links for clustering algorithms that work on series
> subsequences and I found some fun results.
>
> Clustering is meaningless!
> http://www.cs.ucr.edu/~eamonn/meaningless.pdf
>
> I think you're in "no free lunch" territory. "Clustering of subsequence
> time series remains an open issue in time series clustering"
> http://www.hindawi.com/journals/tswj/2014/312521/
>
> Any more detail on the problem to add constraints?
>

Some light reading suggests that you can improve your problem by defining a
minimum size for a subsequence to qualify. One paper suggests calling these
more interesting repetitions a "motif" to use a music metaphor. Looking for
any repetitions results in too many trivial results. Is that valid for your
usage?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Detecting repeated subsequences of identical items

2016-04-20 Thread Michael Selik
On Wed, Apr 20, 2016 at 11:11 PM Steven D'Aprano 
wrote:

> I want to group [repeated] subsequences. For example, I have:
> "ABCABCABCDEABCDEFABCABCABCB"
> and I want to group it into repeating subsequences. I can see two
> ways... How can I do this? Does this problem have a standard name and/or
> solution?
>

I'm not aware of a standard name. This sounds like an unsupervised learning
problem. There's no objectively correct answer unless you add more
specificity to the problem statement.

Regexes may sound tempting at first, but because a repeating subsequence
may have nested repeating subsequences and this can go on infinitely, I
think we at least need a push-down automata.

I checked out some links for clustering algorithms that work on series
subsequences and I found some fun results.

Clustering is meaningless!
http://www.cs.ucr.edu/~eamonn/meaningless.pdf

I think you're in "no free lunch" territory. "Clustering of subsequence
time series remains an open issue in time series clustering"
http://www.hindawi.com/journals/tswj/2014/312521/

Any more detail on the problem to add constraints?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Detecting repeated subsequences of identical items

2016-04-20 Thread Chris Angelico
On Thu, Apr 21, 2016 at 1:07 PM, Steven D'Aprano  wrote:
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>
> or:
>
> ABC ABC ABC D E A B C D E F ABC ABC ABC B

Interesting. I've *almost* managed to (ab)use re.split for this
purpose. A one-step solution can be done with re.match:

>>> txt = "ABCABCABCDEABCDEFABCABCABCB"
>>> re.match(r'(.+)\1+', txt)
<_sre.SRE_Match object; span=(0, 9), match='ABCABCABC'>

But split then returns only the grouped part:

>>> re.split(r'(.+)\1+', txt)
['', 'ABC', 'DEABCDEF', 'ABC', 'B']

or *all* the grouped parts:

>>> re.split(r'((.+)\2+)', txt)
['', 'ABCABCABC', 'ABC', 'DEABCDEF', 'ABCABCABC', 'ABC', 'B']

There's definitely a partial solution happening here, but I can't
quite make it work.

And no, I don't know if there's a standard name for it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Detecting repeated subsequences of identical items

2016-04-20 Thread Ethan Furman

On 04/20/2016 08:57 PM, Ethan Furman wrote:

> [snip same pattern as Steven wrote]

Nevermind.  It's obviously time for me to go to bed.  :/

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Detecting repeated subsequences of identical items

2016-04-20 Thread Ethan Furman

On 04/20/2016 08:07 PM, Steven D'Aprano wrote:


Now I want to group subsequences. For example, I have:

"ABCABCABCDEABCDEFABCABCABCB"

and I want to group it into repeating subsequences. I can see two ways to
group it:

ABC ABC ABCDE ABCDE F ABC ABC ABC B

giving counts:

(ABC) count = 2
(ABCDE) count = 2
F count = 1
(ABC) count = 3
B repeats 1 time


or

ABC ABC ABC D E A B C D E F ABC ABC B

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread eryk sun
On Wed, Apr 20, 2016 at 9:58 AM, Tim Golden  wrote:
> If it's not, then try copying the lpr.exe to c:\windows\syswow64 and try
> again. (Or to some other place to which you have access).

WOW64 in Windows 7+ has a virtual "SysNative" directory that accesses
the native 64-bit system directory:

if '32bit' in platform.architecture():
lpr = os.path.join(os.environ['SystemRoot'], 'SysNative', 'lpr.exe')
else:
lpr = os.path.join(os.environ['SystemRoot'], 'System32', 'lpr.exe')
-- 
https://mail.python.org/mailman/listinfo/python-list


Detecting repeated subsequences of identical items

2016-04-20 Thread Steven D'Aprano
I want to group repeated items in a sequence. For example, I can group
repeated sequences of a single item at a time using groupby:


from itertools import groupby
for key, group in groupby("BBCDDEEE"):
group = list(group)
print(key, "count =", len(group))


outputs:

A count = 4
B count = 2
C count = 1
D count = 2
E count = 3
F count = 4


Now I want to group subsequences. For example, I have:

"ABCABCABCDEABCDEFABCABCABCB"

and I want to group it into repeating subsequences. I can see two ways to
group it:

ABC ABC ABCDE ABCDE F ABC ABC ABC B

giving counts:

(ABC) count = 2
(ABCDE) count = 2
F count = 1
(ABC) count = 3
B repeats 1 time


or:

ABC ABC ABC D E A B C D E F ABC ABC ABC B

giving counts:

(ABC) count = 3
D count = 1
E count = 1
A count = 1
B count = 1
C count = 1
D count = 1
E count = 1
F count = 1
(ABC) count = 3
B count = 1



How can I do this? Does this problem have a standard name and/or solution?




-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Failed install scipy lib

2016-04-20 Thread eryk sun
On Wed, Apr 20, 2016 at 8:04 AM, Oscar Benjamin
 wrote:
> On 20 April 2016 at 12:30,   wrote:
>
>> from ._ufuncs import *
>>   File "scipy\special\_ufuncs.pyx", line 1, in init scipy.special._ufuncs 
>> (scipy\special\_ufuncs.c:26242)
>> ImportError: DLL load failed: The specified module could not be found.
>
> Maybe this is the problem:
>
> http://stackoverflow.com/questions/36489487/error-of-import-scipy-stats-for-windows-7

Here's the link for the VC++ Redistributable for VS 2015. Try
installing the 32-bit version.

https://www.microsoft.com/en-us/download/details.aspx?id=48145
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Xlms namespace

2016-04-20 Thread Joaquin Alzola
>> The problem:
>> test\ntest\ntest<{[£ EURO&%]}>

>If I had to make a guess, you need to escape the <, >, and &characters or else 
>they'll get parsed by the XML parser.  Try sending 
>"test\ntest\ntest<{[£ EURO&%>]}>"

Yes it is the xml itself.

Putting the & and also the < I can make it work with the desire 
characters

So the result: test\ntest\ntest<{[£&€%]}>

Will print < and also &.

Thanks.
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread Steven D'Aprano
On Thu, 21 Apr 2016 05:34 am, Ken Seehart wrote:

> Currently the common pattern for yielding the elements in a sequence is as
> follows:
> 
>   for x in sequence: yield x
> 
> I propose the following replacement (the result would be identical):
> 
>   yield *sequence

Others have already pointed out that this already exists as "yield from
iter(sequence)", but I'd like to say that this syntax was not added merely
to shorten the "for x in sequence: yield x" idiom.

In its simplest case, "yield from expr" is equivalent to "for x in expr:
yield x", and it is completely reasonable to use it for such simple
purposes. But that's not why it was added to the language, and if that's
*all* it did, it probably wouldn't have been.

Rather, "yield from" was added to support the full set of generator
behaviour, including their send(), close() and throw() methods. That
makes "yield from expr" equivalent to this rather formidable chunk of code:



_i = iter(EXPR)
try:
_y = next(_i)
except StopIteration as _e:
_r = _e.value
else:
while 1:
try:
_s = yield _y
except GeneratorExit as _e:
try:
_m = _i.close
except AttributeError:
pass
else:
_m()
raise _e
except BaseException as _e:
_x = sys.exc_info()
try:
_m = _i.throw
except AttributeError:
raise _e
else:
try:
_y = _m(*_x)
except StopIteration as _e:
_r = _e.value
break
else:
try:
if _s is None:
_y = next(_i)
else:
_y = _i.send(_s)
except StopIteration as _e:
_r = _e.value
break
RESULT = _r




See PEP 380 for more info:

https://www.python.org/dev/peps/pep-0380/


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread Chris Angelico
On Thu, Apr 21, 2016 at 8:26 AM,   wrote:
> Anyway, thanks for the link. And I suppose checking Python 3 for 
> implementation would be a good prior step as well! Sadly, "yield from" is not 
> in python 2.7, but it's presence in python 3.3 renders my proposal dead as a 
> parrot without a liver.
>

This is what happens when you make proposals. Guido van Rossum has a
time machine, and he'll go back in time, implement the feature, and
quietly come back here :)

Keep on thinking about what would make the language better. Ideas are
great! But do remember to check the latest version of Python
(currently 3.5, with 3.6 in development); no new features will be
added to 2.7. In fact, I'd recommend making the switch to 3.5 as soon
as possible; you'll gain quite a few cool new features.

All the best!

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread kenseehart
On Wednesday, April 20, 2016 at 1:00:45 PM UTC-7, Ethan Furman wrote:
> On 04/20/2016 12:34 PM, Ken Seehart wrote:
> 
> New ideas for Python are typically vetted on Python Ideas. [1]
> 
> > Currently the common pattern for yielding the elements in a sequence
>  > is as follows:
> >
> >for x in sequence: yield x
> >
> > I propose the following replacement (the result would be identical):
> >
> >yield *sequence
> >
> > The semantics are somewhat different from argument expansion (from
>  > which the syntax is borrowed), but intuitive: yield all of the elements
>  > of a sequence (as opposed to yield the sequence as a single item).
> 
> Your examples do not make clear what your result should be.  If you mean 
> the results are exactly the same you can get that behavior with
> 
>  yield from iter(x)
> 
> which, while being slightly longer, has the advantage of already 
> working.  ;)
> 
> --
> ~Ethan~
> 
> [1] https://mail.python.org/mailman/listinfo/python-ideas

To be clear, the comment "...(the result would be identical)" is indicative 
that the result would be identical, meaning "exactly the same".

Anyway, thanks for the link. And I suppose checking Python 3 for implementation 
would be a good prior step as well! Sadly, "yield from" is not in python 2.7, 
but it's presence in python 3.3 renders my proposal dead as a parrot without a 
liver.

Regards,
Ken
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread Stephen Hansen
On Wed, Apr 20, 2016, at 06:57 AM, loial wrote:
> process = subprocess.Popen(commandline, shell=True,
> stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> 
> where command line is
> C:/windows/system32/lpr.exe -S 172.28.84.38 -P RAW C:/john/myfile

Try making command line:
commandline = r"C:\windows\system32\lpr.exe -S 172.28.84.38 -P RAW
C:\john\myfile"

The r in front of the string makes it a raw string so you don't have to
double up the slashes.

---
Stephen Hansen
  m e @ i x o k a i . i o
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread Alan Evangelista



Currently the common pattern for yielding the elements in a sequence is as 
follows:

   for x in sequence: yield x

I propose the following replacement (the result would be identical):

   yield *sequence


imho the current syntax is much more intuitive, it is obvious to infer what it 
does
by looking at it. I favor a more intuitive syntax over a more concise one.


Regards,
Alan Evangelista
--
https://mail.python.org/mailman/listinfo/python-list


Re: PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread Ethan Furman

On 04/20/2016 12:34 PM, Ken Seehart wrote:

New ideas for Python are typically vetted on Python Ideas. [1]


Currently the common pattern for yielding the elements in a sequence

> is as follows:


   for x in sequence: yield x

I propose the following replacement (the result would be identical):

   yield *sequence

The semantics are somewhat different from argument expansion (from

> which the syntax is borrowed), but intuitive: yield all of the elements
> of a sequence (as opposed to yield the sequence as a single item).

Your examples do not make clear what your result should be.  If you mean 
the results are exactly the same you can get that behavior with


yield from iter(x)

which, while being slightly longer, has the advantage of already 
working.  ;)


--
~Ethan~

[1] https://mail.python.org/mailman/listinfo/python-ideas
--
https://mail.python.org/mailman/listinfo/python-list


Re: PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread Random832
On Wed, Apr 20, 2016, at 15:34, Ken Seehart wrote:
> Currently the common pattern for yielding the elements in a sequence is
> as follows:
> 
>   for x in sequence: yield x
> 
> I propose the following replacement (the result would be identical):
> 
>   yield *sequence

yield from sequence
-- 
https://mail.python.org/mailman/listinfo/python-list


PEP proposal: sequence expansion support for yield statement: yield *

2016-04-20 Thread Ken Seehart
Currently the common pattern for yielding the elements in a sequence is as 
follows:

  for x in sequence: yield x

I propose the following replacement (the result would be identical):

  yield *sequence

The semantics are somewhat different from argument expansion (from which the 
syntax is borrowed), but intuitive: yield all of the elements of a sequence (as 
opposed to yield the sequence as a single item). This doesn't appear to have 
any syntactical collisions, as it is currently a syntax error.

Motivation: More compact notation, and the compiler can produce more efficient 
bytecode than the former representation (the loop overhead is omitted). This 
pattern is very common in recursive generators, so a compact notation would be 
nice.

Also, there is precedent: the proposed notation is implemented in javascript 
with identical semantics (though in javascript, the conventional spacing is 
different: yield* sequence ).

Examples:
  yield *(1,2,3)
... instead of :
  yield 1; yield 2; yield 3
... or:
  for x in (1,2,3): yield x


  yield *chain(seq1, seq2)
... instead of :
  for x in chain(seq1, seq2) yield x

~ Ken Seehart

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Xlms namespace

2016-04-20 Thread sohcahtoa82
On Wednesday, April 20, 2016 at 10:05:02 AM UTC-7, Joaquin Alzola wrote:
> Hi Guys
> 
> I am currently doing this:
> 
> IP client(Python) --> send SOAPXML request --> IP Server (Python)
> 
> SOAP request:
> http://schemas.xmlsoap.org/soap/envelope/"; 
> xmlns:req="http:/
> /request.messagepush.interfaces.comviva.com" 
> xmlns:xsd="http://request.messagepush.interfaces
> .comviva.com/xsd">
> test\ntest\ntest<{[£ EURO&%]}>
> 
> From my IP Client:
> s.send(data_send.encode('utf-8'))
> 
> From my IPServer:
> xml_decoded = data.decode('utf-8')
>   xml_root = 
> ET.ElementTree(ET.fromstring(xml_decoded)).getroot()
>   for elem in xml_root.getiterator():
>  
> if('{http://request.messagepush.interfaces.comviva.com/xsd}shortCode'==elem.tag):
> shortCode = 
> (elem.text).rstrip()
>  
> if('{http://request.messagepush.interfaces.comviva.com/xsd}text'==elem.tag):
> send_text = 
> (elem.text).rstrip()
>  
> if('{http://request.messagepush.interfaces.comviva.com/xsd}item'==elem.tag):
> subscribers = 
> (elem.text).rstrip()
>   result_sms = 
> send_sms(subscribers,shortCode,send_text)
> 
> It is working fine but I am having problems with a couple of special 
> characters, & and <
> 
> The problem:
> test\ntest\ntest<{[£ EURO&%]}>
> 
> It seems as if I send this: <> and the character & then I have a problem.
> I need to use utf-8 as I need to make sure I get 160 characters in one SMS.
> 
> Error:
> Traceback (most recent call last):
>   File "./ipserver.py", line 52, in 
> main()
>   File "./ipserver.py", line 36, in main
> xml_root = ET.ElementTree(ET.fromstring(xml_decoded)).getroot()
>   File "/usr/lib64/python3.4/xml/etree/ElementTree.py", line 1325, in XML
> parser.feed(text)
> xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 19, 
> column 48
> This email is confidential and may be subject to privilege. If you are not 
> the intended recipient, please do not copy or disclose its content but 
> contact the sender immediately upon receipt.

If I had to make a guess, you need to escape the <, >, and &characters or else 
they'll get parsed by the XML parser.  Try sending 
"test\ntest\ntest<{[£ EURO&%]}>"
-- 
https://mail.python.org/mailman/listinfo/python-list


Xlms namespace

2016-04-20 Thread Joaquin Alzola
Hi Guys

I am currently doing this:

IP client(Python) --> send SOAPXML request --> IP Server (Python)

SOAP request:
http://schemas.xmlsoap.org/soap/envelope/"; 
xmlns:req="http:/
/request.messagepush.interfaces.comviva.com" 
xmlns:xsd="http://request.messagepush.interfaces
.comviva.com/xsd">
test\ntest\ntest<{[£€&%]}>

From my IP Client:
s.send(data_send.encode('utf-8'))

From my IPServer:
xml_decoded = data.decode('utf-8')
  xml_root = 
ET.ElementTree(ET.fromstring(xml_decoded)).getroot()
  for elem in xml_root.getiterator():
 
if('{http://request.messagepush.interfaces.comviva.com/xsd}shortCode'==elem.tag):
shortCode = 
(elem.text).rstrip()
 
if('{http://request.messagepush.interfaces.comviva.com/xsd}text'==elem.tag):
send_text = 
(elem.text).rstrip()
 
if('{http://request.messagepush.interfaces.comviva.com/xsd}item'==elem.tag):
subscribers = 
(elem.text).rstrip()
  result_sms = 
send_sms(subscribers,shortCode,send_text)

It is working fine but I am having problems with a couple of special 
characters, & and <

The problem:
test\ntest\ntest<{[£€&%]}>

It seems as if I send this: <> and the character & then I have a problem.
I need to use utf-8 as I need to make sure I get 160 characters in one SMS.

Error:
Traceback (most recent call last):
  File "./ipserver.py", line 52, in 
main()
  File "./ipserver.py", line 36, in main
xml_root = ET.ElementTree(ET.fromstring(xml_decoded)).getroot()
  File "/usr/lib64/python3.4/xml/etree/ElementTree.py", line 1325, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 19, 
column 48
This email is confidential and may be subject to privilege. If you are not the 
intended recipient, please do not copy or disclose its content but contact the 
sender immediately upon receipt.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Creating Dict of Dict of Lists with joblib and Multiprocessing

2016-04-20 Thread Michael Selik
On Wed, Apr 20, 2016 at 10:50 AM Sims, David (NIH/NCI) [C] <
david.si...@nih.gov> wrote:

> Hi,
>
> Cross posted at
> http://stackoverflow.com/questions/36726024/creating-dict-of-dicts-with-joblib-and-multiprocessing,
> but thought I'd try here too as no responses there so far.
>
> A bit new to python and very new to parallel processing in python.  I have
> a script that will process a datafile and generate a dict of dicts.
> However, as I need to run this task on hundreds to thousands of these files
> and ultimately collate the data, I thought parallel processing made a lot
> of sense.  However, I can't seem to figure out how to create a data
> structure.  Minimal script without all the helper functions:
>
> #!/usr/bin/python
> import sys
> import os
> import re
> import subprocess
> import multiprocessing
> from joblib import Parallel, delayed
> from collections import defaultdict
> from pprint import pprint
>
> def proc_vcf(vcf,results):
> sample_name = vcf.rstrip('.vcf')
> results.setdefault(sample_name, {})
>
> # Run Helper functions 'run_cmd()' and 'parse_variant_data()' to
> generate a list of entries. Expect a dict of dict of lists
> all_vars = run_cmd('vcfExtractor',vcf)
> results[sample_name]['all_vars'] = parse_variant_data(all_vars,'all')
>
> # Run Helper functions 'run_cmd()' and 'parse_variant_data()' to
> generate a different list of data based on a different set of criteria.
> mois = run_cmd('moi_report', vcf)
> results[sample_name]['mois'] = parse_variant_data(mois, 'moi')
> return results
>
> def main():
> input_files = sys.argv[1:]
>
> # collected_data = defaultdict(lambda: defaultdict(dict))
> collected_data = {}
>
> # Parallel Processing version
> # num_cores = multiprocessing.cpu_count()
> # Parallel(n_jobs=num_cores)(delayed(proc_vcf)(vcf,collected_data) for
> vcf in input_files)
>
> # for vcf in input_files:
> # proc_vcf(vcf, collected_data)
>
> pprint(dict(collected_data))
> return
>
> if __name__=="__main__":
> main()
>
>
> Hard to provide source data as it's very large, but basically, the dataset
> will generate a dict of dicts of lists that contain two sets of data for
> each input keyed by sample and data type:
>
> { 'sample1' : {
> 'all_vars' : [
> 'data_val1',
> 'data_val2',
> 'etc'],
> 'mois' : [
> 'data_val_x',
> 'data_val_y',
> 'data_val_z']
> }
> 'sample2' : {
>'all_vars' : [
>.
>.
>.
>]
> }
> }
>
> If I run it without trying to multiprocess, not a problem.  I can't figure
> out how to parallelize this and create the same data structure.  I've tried
> to use defaultdict to create a defaultdict in main() to pass along, as well
> as a few other iterations, but I can't seem to get it right (getting key
> errors, pickle errors, etc.).  Can anyone help me with the proper way to do
> this?  I think I'm not making / initializing / working with the data
> structure correctly, but maybe my whole approach is ill conceived?
>


Processes cannot share memory, so your collected_data is only copied once,
at the time you pass it to each subprocess. There's an undocumented
ThreadPool that works the same as the process Pool (
https://docs.python.org/3.5/library/multiprocessing.html#using-a-pool-of-workers
)

ThreadPool will share memory across your subthreads. In the example I liked
to, just replace ``from multiprocessing import Pool`` with ``from
multiprocessing.pool import ThreadPool``.

How compute-intensive is your task? If it's mostly disk-read-intensive
rather than compute-intensive, then threads is all you need.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Failed install scipy lib

2016-04-20 Thread Gonzalo V
oscar
instálate mejor anaconda y listo.

Saludos,
Gonzalo

2016-04-20 10:04 GMT-03:00 Oscar Benjamin :

> On 20 April 2016 at 12:30,   wrote:
> > On Wednesday, April 20, 2016 at 2:09:10 PM UTC+3, liran@gmail.com
> wrote:
> >> On Tuesday, April 19, 2016 at 9:21:42 PM UTC+3, eryk sun wrote:
> >> > On Tue, Apr 19, 2016 at 12:05 PM, Oscar Benjamin
> >> >  wrote:
> >> > > On 19 Apr 2016 17:01,  wrote:
> >> > >>
> >> > >> i'm trying to use:
> >> > >> "py -m pip install scipy"
> >> > >> and after couple of lines a get an error saying:
> >> > >
> >> > > I thought that binary wheels for scipy would be available on pypi
> for each
> >> > > OS now. Try updating pip and then using it to install scipy.
> >> >
> >> > PyPI only has Windows wheels for NumPy, not SciPy. You can use
> >> > Christoph Gohlke's unofficial packages:
> >> >
> >> > http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
> >> > http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
> >>
> >> Tried to install thouse packages and i'm getting the error:
> >>
> >> scipy-0.17.0-cp35-none-win_amd64.whl is not a supported wheel on this
> platform.
> >>
> >> i've also tried it with the versions:
> >> scipy-0.17.0-cp27-none-win_amd64.whl is not a supported wheel on this
> platform.
> >>
> >> scipy-0.17.0-cp34-none-win_amd64.whl is not a supported wheel on this
> platform.
> >>
> >>
> >> I'm using win 10 x64. And python 3.5
> >
> > Ok i got it to install with the version:
> > scipy-0.17.0-cp35-none-win32.whl
> > I've also install this version:
> > numpy-1.11.0+mkl-cp35-cp35m-win32.whl
> >
> > But when i try to run:
>  from scipy.stats.stats import pearsonr
> >
> > I get this error:
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > from scipy.stats.stats import pearsonr
> >   File
> "C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\stats\__init__.py",
> line 338, in 
> > from .stats import *
> >   File
> "C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\stats\stats.py",
> line 180, in 
> > import scipy.special as special
> >   File
> "C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\special\__init__.py",
> line 627, in 
> > from ._ufuncs import *
> >   File "scipy\special\_ufuncs.pyx", line 1, in init
> scipy.special._ufuncs (scipy\special\_ufuncs.c:26242)
> > ImportError: DLL load failed: The specified module could not be found.
>
> Maybe this is the problem:
>
>
> http://stackoverflow.com/questions/36489487/error-of-import-scipy-stats-for-windows-7
>
> --
> Oscar
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread Tim Golden
On 20/04/2016 15:21, loial wrote:
> As I said, the lpr command works fine from the command prompt but not from 
> python.

Sorry; I did miss that.

> Everything is 64-bit (windows server 2012).
> 

Is the Python installation also 64-bit?

c:\python27\python.exe -c "import platform; print platform.architecture()"

If it is, then I'm not sure what's going on.

If it's not, then try copying the lpr.exe to c:\windows\syswow64 and try
again. (Or to some other place to which you have access).

TJG
-- 
https://mail.python.org/mailman/listinfo/python-list


Creating Dict of Dict of Lists with joblib and Multiprocessing

2016-04-20 Thread Sims, David (NIH/NCI) [C]
Hi,

Cross posted at 
http://stackoverflow.com/questions/36726024/creating-dict-of-dicts-with-joblib-and-multiprocessing,
 but thought I'd try here too as no responses there so far.

A bit new to python and very new to parallel processing in python.  I have a 
script that will process a datafile and generate a dict of dicts.  However, as 
I need to run this task on hundreds to thousands of these files and ultimately 
collate the data, I thought parallel processing made a lot of sense.  However, 
I can't seem to figure out how to create a data structure.  Minimal script 
without all the helper functions:

#!/usr/bin/python
import sys
import os
import re
import subprocess
import multiprocessing
from joblib import Parallel, delayed
from collections import defaultdict
from pprint import pprint

def proc_vcf(vcf,results):
sample_name = vcf.rstrip('.vcf')
results.setdefault(sample_name, {})

# Run Helper functions 'run_cmd()' and 'parse_variant_data()' to generate a 
list of entries. Expect a dict of dict of lists
all_vars = run_cmd('vcfExtractor',vcf)
results[sample_name]['all_vars'] = parse_variant_data(all_vars,'all')

# Run Helper functions 'run_cmd()' and 'parse_variant_data()' to generate a 
different list of data based on a different set of criteria.
mois = run_cmd('moi_report', vcf)
results[sample_name]['mois'] = parse_variant_data(mois, 'moi')
return results

def main():
input_files = sys.argv[1:]

# collected_data = defaultdict(lambda: defaultdict(dict))
collected_data = {}

# Parallel Processing version
# num_cores = multiprocessing.cpu_count()
# Parallel(n_jobs=num_cores)(delayed(proc_vcf)(vcf,collected_data) for vcf 
in input_files)

# for vcf in input_files:
# proc_vcf(vcf, collected_data)

pprint(dict(collected_data))
return

if __name__=="__main__":
main()


Hard to provide source data as it's very large, but basically, the dataset will 
generate a dict of dicts of lists that contain two sets of data for each input 
keyed by sample and data type:

{ 'sample1' : {
'all_vars' : [
'data_val1',
'data_val2',
'etc'],
'mois' : [
'data_val_x',
'data_val_y',
'data_val_z']
}
'sample2' : {
   'all_vars' : [
   .
   .
   .
   ]
}
}

If I run it without trying to multiprocess, not a problem.  I can't figure out 
how to parallelize this and create the same data structure.  I've tried to use 
defaultdict to create a defaultdict in main() to pass along, as well as a few 
other iterations, but I can't seem to get it right (getting key errors, pickle 
errors, etc.).  Can anyone help me with the proper way to do this?  I think I'm 
not making / initializing / working with the data structure correctly, but 
maybe my whole approach is ill conceived?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread loial
I get the same issue if I just specify "lpr" rather than a full path, i.e. it 
works from the command prompt(with forward slashes), but not from python 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread loial
As I said, the lpr command works fine from the command prompt but not from 
python.

Everything is 64-bit (windows server 2012).

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread Tim Golden
On 20/04/2016 14:57, loial wrote:
> I am trying to run lpr from python 2.7.10 on windows
> 
> However I always get the error
> 'C:/windows/system32/lpr.exe ' is not recognized as an internal or external 
> command,
> operable program or batch file.
> 
> Even though typing the same at the  command prompt works OK
> 
> 
> Any ideas?
> 
> I am using subprocess as follows
> 
> process = subprocess.Popen(commandline, shell=True, stdout=subprocess.PIPE, 
> stderr=subprocess.PIPE)
> 
> where command line is
> C:/windows/system32/lpr.exe -S 172.28.84.38 -P RAW C:/john/myfile
> 

Ummm.. Do you actually have a program called lpr.exe in that location?
It's not usual on Windows. (I rather assume you do since you give the
full path, but still...)

IOW, what happens if you type:

  dir C:\windows\system32\lpr.exe

at a command promopt?

Also: are you on a 64-bit system? If so, c:\windows\system32 probably
isn't where you think it is. cf, for example:

  https://mail.python.org/pipermail/python-win32/2012-March/012121.html

TJG
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread Random832
On Wed, Apr 20, 2016, at 09:57, loial wrote:
> I am trying to run lpr from python 2.7.10 on windows
> 
> However I always get the error
> 'C:/windows/system32/lpr.exe ' is not recognized as an internal or
> external command,
> operable program or batch file.
> 
> Even though typing the same at the  command prompt works OK

It does? This command doesn't exist on my machine. It's not a standard
part of windows.

Just to check though, are you 64-bit windows, and 32 or 64 bit python?
(To find out what kind of windows, go to the system control panel - to
find out if python is 64-bit look at the value of sys.maxsize, it's
2147483647 on 32-bit systems and 9223372036854775807 on 64-bit)

> Any ideas?
> 
> I am using subprocess as follows
> 
> process = subprocess.Popen(commandline, shell=True,
> stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> 
> where command line is
> C:/windows/system32/lpr.exe -S 172.28.84.38 -P RAW C:/john/myfile

If all you want to do is print a text file, see
http://www.robvanderwoude.com/printfiles.php - these commands may not
let you do whatever you're trying to do with that IP address though.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running lpr on windows from python

2016-04-20 Thread Chris Angelico
On Wed, Apr 20, 2016 at 11:57 PM, loial  wrote:
> I am trying to run lpr from python 2.7.10 on windows
>
> However I always get the error
> 'C:/windows/system32/lpr.exe ' is not recognized as an internal or external 
> command,
> operable program or batch file.
>
> Even though typing the same at the  command prompt works OK
>
>
> Any ideas?
>
> I am using subprocess as follows
>
> process = subprocess.Popen(commandline, shell=True, stdout=subprocess.PIPE, 
> stderr=subprocess.PIPE)
>
> where command line is
> C:/windows/system32/lpr.exe -S 172.28.84.38 -P RAW C:/john/myfile

You''re running that through the shell, which means you have to abide
by shell rules. I don't have a Windows handy, but I'm pretty sure its
shell isn't happy with forward slashes in the command line; I might be
wrong there.

My recommendation: Split that into separate arguments, pass them as a
list, and remove shell=True. And unless you need to be completely
explicit for some reason (eg to protect against path-based exploits),
cut the first argument to just "lpr" and let the binary be found
anywhere.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Running lpr on windows from python

2016-04-20 Thread loial
I am trying to run lpr from python 2.7.10 on windows

However I always get the error
'C:/windows/system32/lpr.exe ' is not recognized as an internal or external 
command,
operable program or batch file.

Even though typing the same at the  command prompt works OK


Any ideas?

I am using subprocess as follows

process = subprocess.Popen(commandline, shell=True, stdout=subprocess.PIPE, 
stderr=subprocess.PIPE)

where command line is
C:/windows/system32/lpr.exe -S 172.28.84.38 -P RAW C:/john/myfile
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Failed install scipy lib

2016-04-20 Thread Oscar Benjamin
On 20 April 2016 at 12:30,   wrote:
> On Wednesday, April 20, 2016 at 2:09:10 PM UTC+3, liran@gmail.com wrote:
>> On Tuesday, April 19, 2016 at 9:21:42 PM UTC+3, eryk sun wrote:
>> > On Tue, Apr 19, 2016 at 12:05 PM, Oscar Benjamin
>> >  wrote:
>> > > On 19 Apr 2016 17:01,  wrote:
>> > >>
>> > >> i'm trying to use:
>> > >> "py -m pip install scipy"
>> > >> and after couple of lines a get an error saying:
>> > >
>> > > I thought that binary wheels for scipy would be available on pypi for 
>> > > each
>> > > OS now. Try updating pip and then using it to install scipy.
>> >
>> > PyPI only has Windows wheels for NumPy, not SciPy. You can use
>> > Christoph Gohlke's unofficial packages:
>> >
>> > http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
>> > http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
>>
>> Tried to install thouse packages and i'm getting the error:
>>
>> scipy-0.17.0-cp35-none-win_amd64.whl is not a supported wheel on this 
>> platform.
>>
>> i've also tried it with the versions:
>> scipy-0.17.0-cp27-none-win_amd64.whl is not a supported wheel on this 
>> platform.
>>
>> scipy-0.17.0-cp34-none-win_amd64.whl is not a supported wheel on this 
>> platform.
>>
>>
>> I'm using win 10 x64. And python 3.5
>
> Ok i got it to install with the version:
> scipy-0.17.0-cp35-none-win32.whl
> I've also install this version:
> numpy-1.11.0+mkl-cp35-cp35m-win32.whl
>
> But when i try to run:
 from scipy.stats.stats import pearsonr
>
> I get this error:
> Traceback (most recent call last):
>   File "", line 1, in 
> from scipy.stats.stats import pearsonr
>   File 
> "C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\stats\__init__.py",
>  line 338, in 
> from .stats import *
>   File 
> "C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\stats\stats.py",
>  line 180, in 
> import scipy.special as special
>   File 
> "C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\special\__init__.py",
>  line 627, in 
> from ._ufuncs import *
>   File "scipy\special\_ufuncs.pyx", line 1, in init scipy.special._ufuncs 
> (scipy\special\_ufuncs.c:26242)
> ImportError: DLL load failed: The specified module could not be found.

Maybe this is the problem:

http://stackoverflow.com/questions/36489487/error-of-import-scipy-stats-for-windows-7

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Just-in-Time Static Type Checking for Dynamic Languages

2016-04-20 Thread Neal Becker
I saw this article, which might interest some of you.  It discusses 
application to ruby, but perhaps might have ideas useful for python.

https://arxiv.org/abs/1604.03641

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: djqgrid 0.2.4 error import json_helpers

2016-04-20 Thread Chris Angelico
On Wed, Apr 20, 2016 at 9:00 PM, asimkon .  wrote:
> Inside my templatetags folder  (djqgrid.py),  i have the following commands:
>
> from django import template
> from djqgrid import json_helpers
>
> Any idea to get a solution ?
>

You're attempting to import from yourself, there. Is that what you
intend? Or should you be using a different name for one of those?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Failed install scipy lib

2016-04-20 Thread liran . maymoni
On Wednesday, April 20, 2016 at 2:09:10 PM UTC+3, liran@gmail.com wrote:
> On Tuesday, April 19, 2016 at 9:21:42 PM UTC+3, eryk sun wrote:
> > On Tue, Apr 19, 2016 at 12:05 PM, Oscar Benjamin
> >  wrote:
> > > On 19 Apr 2016 17:01,  wrote:
> > >>
> > >> i'm trying to use:
> > >> "py -m pip install scipy"
> > >> and after couple of lines a get an error saying:
> > >
> > > I thought that binary wheels for scipy would be available on pypi for each
> > > OS now. Try updating pip and then using it to install scipy.
> > 
> > PyPI only has Windows wheels for NumPy, not SciPy. You can use
> > Christoph Gohlke's unofficial packages:
> > 
> > http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
> > http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
> 
> Tried to install thouse packages and i'm getting the error:
> 
> scipy-0.17.0-cp35-none-win_amd64.whl is not a supported wheel on this 
> platform.
> 
> i've also tried it with the versions:
> scipy-0.17.0-cp27-none-win_amd64.whl is not a supported wheel on this 
> platform.
> 
> scipy-0.17.0-cp34-none-win_amd64.whl is not a supported wheel on this 
> platform.
> 
> 
> I'm using win 10 x64. And python 3.5

Ok i got it to install with the version:
scipy-0.17.0-cp35-none-win32.whl
I've also install this version:
numpy-1.11.0+mkl-cp35-cp35m-win32.whl

But when i try to run:
>>> from scipy.stats.stats import pearsonr

I get this error:
Traceback (most recent call last):
  File "", line 1, in 
from scipy.stats.stats import pearsonr
  File 
"C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\stats\__init__.py",
 line 338, in 
from .stats import *
  File 
"C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\stats\stats.py",
 line 180, in 
import scipy.special as special
  File 
"C:\Users\Liran\AppData\Local\Programs\Python\Python35-32\lib\site-packages\scipy\special\__init__.py",
 line 627, in 
from ._ufuncs import *
  File "scipy\special\_ufuncs.pyx", line 1, in init scipy.special._ufuncs 
(scipy\special\_ufuncs.c:26242)
ImportError: DLL load failed: The specified module could not be found.


:(
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Failed install scipy lib

2016-04-20 Thread liran . maymoni
On Tuesday, April 19, 2016 at 9:21:42 PM UTC+3, eryk sun wrote:
> On Tue, Apr 19, 2016 at 12:05 PM, Oscar Benjamin
>  wrote:
> > On 19 Apr 2016 17:01,  wrote:
> >>
> >> i'm trying to use:
> >> "py -m pip install scipy"
> >> and after couple of lines a get an error saying:
> >
> > I thought that binary wheels for scipy would be available on pypi for each
> > OS now. Try updating pip and then using it to install scipy.
> 
> PyPI only has Windows wheels for NumPy, not SciPy. You can use
> Christoph Gohlke's unofficial packages:
> 
> http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
> http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Tried to install thouse packages and i'm getting the error:

scipy-0.17.0-cp35-none-win_amd64.whl is not a supported wheel on this platform.

i've also tried it with the versions:
scipy-0.17.0-cp27-none-win_amd64.whl is not a supported wheel on this platform.

scipy-0.17.0-cp34-none-win_amd64.whl is not a supported wheel on this platform.


I'm using win 10 x64. And python 3.5
-- 
https://mail.python.org/mailman/listinfo/python-list


djqgrid 0.2.4 error import json_helpers

2016-04-20 Thread asimkon .
Hello!

I want to use this wrapper  for
a module in my Django 1.9.4 version. I have configured all the steps
mentioned successfully but in the end i get the following error message in
my browser:

Invalid template library specified. ImportError raised when trying to
load 'djqgrid.templatetags.djqgrid': cannot import name json_helpers


Inside my templatetags folder  (djqgrid.py),  i have the following commands:

from django import template
from djqgrid import json_helpers

Any idea to get a solution ?

I sent a few emails to the author of this wrapper but i have not got a
reply yet!

Regards
Kostas Asimakopoulos
Twitter @asimkon
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Guido sees the light: PEP 8 updated

2016-04-20 Thread Oscar Benjamin
On 20 April 2016 at 02:38, Steven D'Aprano  wrote:
>
> "Oh no! We're having trouble displaying this Scratch project.
>
> If you are on a mobile phone or tablet, try visiting this project on a
> computer.
>
> If you're on a computer, your Flash player might be disabled, missing, or
> out of date."
>
> Yeah, thanks guys. Really helpful.

Having a flash-enabled browser is a lower barrier to entry for most
people in the world than having a code editor and (being able to use)
say a terminal to run your code.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Guido sees the light: PEP 8 updated

2016-04-20 Thread Oscar Benjamin
On 20 April 2016 at 07:08, Terry Reedy  wrote:
> On 4/19/2016 11:41 PM, Chris Angelico wrote:
>>
>> On Wed, Apr 20, 2016 at 1:23 PM, Terry Reedy  wrote:

 It kinda looks like Hypertalk syntax, which some of you may remember I'm
 exceedingly fond of. There's no reason why a GUI editor couldn't display
 Python code using such "building block" structure. E.g. indented blocks
 could use colour and shape cues to reinforce the structure of the code,
 just as Scratch does.
>>>
>>>
>>>
>>> That is an interesting idea.  Perhaps I have been stuck in either/or
>>> thinking -- either graphical or textual. With tk Text (IDLE), it would be
>>> possible to tag each (4-space) indent with a color for the compound
>>> statememt keywork causing the indent.
>>>
>>
>> Interesting indeed! Tell me if I've understood you correctly. You'd
>> display this code:
>>
>> def func(x):
>> for n in range(1, x):
>> while n < x:
>> if n % 2:
>> n = (n + 1) * 3 / 2
>> else:
>> n = n * 2 + 3
>>
>> with stripes of colour, with the entire first column of spaces all
>> tied to the "def", and then the next block of four tied to the "for",
>> etc?
>
>
> Exactly.

Take a look at bluej which is for Java. It surrounds different
constructs with different coloured rectangles. It also provides
UML-ish views of the classes in a project. My students seemed to like
it.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list