Re: Ripping Hebrew CDs

2007-04-25 Thread Ehud Karni
On Mon, 23 Apr 2007 16:24:17 Yedidyah Bar-David wrote:

 On Mon, Apr 23, 2007 at 02:41:51PM +0300, Ehud Karni wrote:
 
  I prefer the file name to be in ISO-8859-8 (8 bits) and not UTF-8.
  Then I can see the Hebrew in Emacs and xterm, but not in Gnome or KDE

 Any reason not to use utf-8 with xterm/emacs? I admit I do not use emacs
 so I do not know how comfortable it is, but in xterm/vim it's fine, tab
 completion and everything.

The real problem is that you can't have both ISO-8859-8 (8 bit Hebrew)
and UTF-8 Hebrew displayed correctly together. My company have a LOT of
files encoded in ISO-8859-8 and it won't be changed in the near (and,
I believe, also in the far[1]) future. I use many simple UNIX tools (
cat, more, sed, etc.) to manipulate and view file names and content, on
both X and tty terminals. So, ISO-8859-8 it is.

Ehud.


[1] Unless the company somehow becomes international and has to have
many languages simultaneously, and even then, I think they will
use fixed wide characters (16 bits) and not UTF-8.


--
 Ehud Karni   Tel: +972-3-7966-561  /\
 Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-25 Thread Ehud Karni
On Tue, 24 Apr 2007 11:56:38 +0300, יובל האגר wrote:

 Ehud Karni: נכתב על ידי ,‎15:46 ,2007 אפריל ‎23 ביום שני:

  You have to re-encode the file name to Hebrew UTF-8 like this:
 
  NEWNM=`echo $NM | iconv -futf8 -tlatin1 | iconv -fhebrew -tutf8`
 

 Thanks! I've been looking for some time how to do this.. I didn't think there
 could even be a latin1/hebrew in a UTF-8 encoding..

 Anyway, that solves my problem too with file names, but how should I handle
 ID3 tags?

You can use the id3lib package ( http://sourceforge.net/projects/id3lib/ )
to extract the should be Hebrew names, try to convert it using something
like the command above, and, if successful, replace the original names.
You can then write a script that will do it to all your songs.

I'm sure you can get this package for any Linux distribution. But also
read this: https://bugs.launchpad.net/debian/+source/id3lib3.8.3/+bug/54136

There are other (using simpler tools) ways to extract the tag and
replace it but you'll have to do it with your own scripts.

Ehud.


--
 @@ @@@ @@ @@   Ehud Karni   אהוד קרני
 @@  @  @@  @   Senior System Support   תמיכה במערכות מחשב
 @@ @@ @  @@Mivtach - Simon   מבטח - סימון
 @@ @@ @@   Insurance agencies  סוכנויות לבטוח
  Better Safe Than SorryTel: 03-7966-561 :טל Fax: 03-7966-667 :פקס
   http://www.mvs.co.il mailto:[EMAIL PROTECTED]  :דואל

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-25 Thread Yuval Hager
ביום רביעי 25 אפריל 2007, 15:09, נכתב על ידי Ehud Karni:
 On Tue, 24 Apr 2007 11:56:38 +0300, יובל האגר wrote:
  Ehud Karni: נכתב על ידי ,�15:46 ,2007 אפריל �23 ביום שני:
   You have to re-encode the file name to Hebrew UTF-8 like this:
  
   NEWNM=`echo $NM | iconv -futf8 -tlatin1 | iconv -fhebrew -tutf8`
 
  Thanks! I've been looking for some time how to do this.. I didn't think
  there could even be a latin1/hebrew in a UTF-8 encoding..
 
  Anyway, that solves my problem too with file names, but how should I
  handle ID3 tags?

 You can use the id3lib package ( http://sourceforge.net/projects/id3lib/ )
 to extract the should be Hebrew names, try to convert it using something
 like the command above, and, if successful, replace the original names.
 You can then write a script that will do it to all your songs.


Exactly! 
It works flawlessly.

 I'm sure you can get this package for any Linux distribution. But also
 read this: https://bugs.launchpad.net/debian/+source/id3lib3.8.3/+bug/54136


Luckily, my distribution (Gentoo) includes this patch by default on the latest 
id3lib version.

Anyway, using easytag v2.0 proved to be painless even in Hebrew and it does a 
great work retagging, where my ripping program doesn't get it right.. Easytag 
is so kind, it even checks if id3lib is broken with regards to UTF8 tags and 
notifies about it.

Thanks!

--yuval
-- 
yuval

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-24 Thread יובל האגר
ביום שני 23 אפריל 2007, 15:46, נכתב על ידי Ehud Karni:
 On Mon, 23 Apr 2007 15:25:48 Hadar wrote:
 [snip]
 Good that you sent the list with the file name. It is encoded in UTF-8
 but in latin1 not Hebrew (the song name is: Chadashot Meha-Yareach -
 News from the Moon, Right ?).

 You have to re-encode the file name to Hebrew UTF-8 like this:

 NEWNM=`echo $NM | iconv -futf8 -tlatin1 | iconv -fhebrew -tutf8`


Thanks! I've been looking for some time how to do this.. I didn't think there 
could even be a latin1/hebrew in a UTF-8 encoding..

Anyway, that solves my problem too with file names, but how should I handle 
ID3 tags? 

I am using k3b for the ripping, and Amarok does not seem to agree with it on 
the ID3 tags character encoding..I've tried easytag, but it didn't write the 
tags correctly (and did not show them on the screen..)

Any insight would be appreciated..

--yuval

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Ripping Hebrew CDs

2007-04-23 Thread Hadar

Hello,

I'm encoding audio CDs into FLAC files using Sound Juicer.
The albums are automatically recognized and the songs names are downloaded.
When ripping Hebrew albums, the songs names are sometimes malformed
(a.k.agibberish - certainly not Hebrew characters).

I've installed the Hebrew packages for Debian and also tried K3B without
success...
Is it a local problem, or a problem with the remote database?
If the problem is local, how can I fix it?

Thanks,
Hadar


Re: Ripping Hebrew CDs

2007-04-23 Thread Geoffrey S. Mendelson
On Mon, Apr 23, 2007 at 11:38:35AM +0300, Hadar wrote:

 I'm encoding audio CDs into FLAC files using Sound Juicer.
 The albums are automatically recognized and the songs names are downloaded.
 When ripping Hebrew albums, the songs names are sometimes malformed
 (a.k.agibberish - certainly not Hebrew characters).

They are Hebrew characters, but not in a charcter set you expect them to
be. 

I get this when I rip CDs using Audiograbber on Windows. The CD is
recognized in the freedb and often I get useable Hebrew names. When I
copy the files to a Linux server via Samba the names become garbled.

Sometimes they still show properly on a Windows computer, sometimes they
are garbage there too.

I know there are ways to fix the Samba side of it, but I don't really care.
Someone who had many Hebrew albums and wanted the names in readable
Hebrew would.


 Is it a local problem, or a problem with the remote database?
 If the problem is local, how can I fix it?

There are two places where the name of the album and the name of the
song are found. One is the obvious, the file name which may or may
not be important to you. 

To me having the name in a simple form in a way I can easily understand
and manipulate it is more important than having it in a proper
representation of the language. Therefore I use the English name of the
artist and a simple translation if I use anything at all for the song name. 

I remove punctation and replace spaces and special characters
with underbars. I convert all file names to lower case.

This is done with a PERL program which has gotten quite sophisticated over
the years. 

The second, and more important to me as Hebrew, location is the ID tags
in the files themselves. Technicaly they are ID version 3, or ID3 tags
for short.

Most players will display the ID3 tags instead of the file name, often
by default, and that would be a matter of making sure the player has
the correct fonts available and uses them.

Of course I am talking about playing them on a computer. Playing them on
an MP3 player and getting proper Hebrew may be impossible. It depends
upon the player.

Geoff.

-- 
Geoffrey S. Mendelson, Jerusalem, Israel [EMAIL PROTECTED]  N3OWJ/4X1GM
IL Voice: (07)-7424-1667  Fax ONLY: 972-2-648-1443 U.S. Voice: 1-215-821-1838 
Visit my 'blog at http://geoffstechno.livejournal.com/

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-23 Thread Hadar

On 4/23/07, Geoffrey S. Mendelson [EMAIL PROTECTED] wrote:


On Mon, Apr 23, 2007 at 01:06:12PM +0300, Hadar wrote:

 I don't care if the names are in Hebrew or English, as long as I get
them
 automatically. Is there any English database for Hebrew albums? Ripping
 large amount of albums, and typing the song names manually can take
ages.

No, but the Freedb often has multiple records for the same CD,
Audiograbber
lets you choose between them, does your ripper? It may be an option.



Sound Juicer doesn't...
Anyone using an application that play well with Hebrew text?


Re: Ripping Hebrew CDs

2007-04-23 Thread Ehud Karni
On Mon, 23 Apr 2007 12:24:54 Geoffrey S. Mendelson wrote:

 On Mon, Apr 23, 2007 at 11:38:35AM Hadar wrote:

  I'm encoding audio CDs into FLAC files using Sound Juicer.
  The albums are automatically recognized and the songs names are downloaded.
  When ripping Hebrew albums, the songs names are sometimes malformed
  (a.k.agibberish - certainly not Hebrew characters).

 They are Hebrew characters, but not in a charcter set you expect them to
 be.

I prefer the file name to be in ISO-8859-8 (8 bits) and not UTF-8.
Then I can see the Hebrew in Emacs and xterm, but not in Gnome or KDE
applications. I assume you get your Hebrew name in either ISO-8859-8
or in PC DOS (CP856).  I have 2 small shell scripts (below) that
converts ALL Hebrew names in a directory tree to ISO-8859-8 / UTF-8.

Ehud.


#! /bin/sh -e
# Translate all file names in this directory tree to iso-8859-8
# -

chk_nm ()
{
echo \n\n now working on `pwd`

for DFL in *
do
case $DFL in
*�* )# 0xD7 is a sign for Hebrew UTF-8
   NDFL=`echo $DFL | iconv -futf8 -thebrew`  ;;

  * )  # NOT UTF-8, DOS Hebrew is 0x80-0x9A
   NDFL=`echo $DFL | tr [€-š] [ת-א]`  ;;
esac
if [ $DFL != $NDFL ] ; then# name has changed ?
mv $DFL $NDFL  # rename
echo  $DFL == $NDFL # show to user
DFL=$NDFL# for recursive checking
fi

if [ ! -L $DFL -a -d $DFL ] ; then
( cd $DFL ; chk_nm )
fi
done
}

chk_nm # start scanning

## trns-utf-2-heb.sh ##




#! /bin/sh -e
# Translate all file names in this directory tree to utf-8
# 

chk_nm ()
{
echo \n\n now working on `pwd`

for DFL in *
do
case $DFL in
*�* );;  # 0xD7 is a sign for Hebrew UTF-8

* )# NOT UTF-8, DOS Hebrew is 0x80-0x9A
NDFL=`echo $DFL | tr [€-š] [ת-א] | iconv -fhebrew -tutf8`
if [ $DFL != $NDFL ] ; then# had any Hebrew ?
mv $DFL $NDFL  # rename
echo  $DFL == $NDFL # show to user
DFL=$NDFL# for recursive checking
fi  ;;
esac

if [ ! -L $DFL -a -d $DFL ] ; then
( cd $DFL ; chk_nm )
fi
done
}

chk_nm # start scanning

## trns-heb-2-utf.sh ##



--
 Ehud Karni   Tel: +972-3-7966-561  /\
 Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-23 Thread Ehud Karni
On Mon, 23 Apr 2007 15:25:48 Hadar wrote:

 Thanks for the scripts!
 If I understand you correctly,  trns-heb-2-utf.sh is what I need.
 When I tried it on a directory, it simply wiped down the Hebrew characters.
 Here's some debugging info:

 + for DFL in '*'
 + case $DFL in
 ++ echo '01 - �§�£�¹�¥�÷ �®�¤�©�¸�§.ogg'
 ++ tr '[�€-�š]' '[א-ת]'
 ++ iconv -fhebrew -tutf8
 iconv: illegal input sequence at position 5

 Looks like the input text is not iso-8859-8 (HEBREW) encoded.
 I played with iconv a little but couldn't find the appropriate encoding
 Any suggestion?

First Hadar - are you boy or a girl ?

Good that you sent the list with the file name. It is encoded in UTF-8
but in latin1 not Hebrew (the song name is: Chadashot Meha-Yareach -
News from the Moon, Right ?).

You have to re-encode the file name to Hebrew UTF-8 like this:

NEWNM=`echo $NM | iconv -futf8 -tlatin1 | iconv -fhebrew -tutf8`

Ehud.


--
 Ehud Karni   Tel: +972-3-7966-561  /\
 Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry

To unsubscribe, 
send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-23 Thread Yedidyah Bar-David
On Mon, Apr 23, 2007 at 02:41:51PM +0300, Ehud Karni wrote:
 On Mon, 23 Apr 2007 12:24:54 Geoffrey S. Mendelson wrote:
 
  On Mon, Apr 23, 2007 at 11:38:35AM Hadar wrote:
 
   I'm encoding audio CDs into FLAC files using Sound Juicer.
   The albums are automatically recognized and the songs names are 
   downloaded.
   When ripping Hebrew albums, the songs names are sometimes malformed
   (a.k.agibberish - certainly not Hebrew characters).
 
  They are Hebrew characters, but not in a charcter set you expect them to
  be.
 
 I prefer the file name to be in ISO-8859-8 (8 bits) and not UTF-8.
 Then I can see the Hebrew in Emacs and xterm, but not in Gnome or KDE

Any reason not to use utf-8 with xterm/emacs? I admit I do not use emacs
so I do not know how comfortable it is, but in xterm/vim it's fine, tab
completion and everything.
-- 
Didi


=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: Ripping Hebrew CDs

2007-04-23 Thread Hadar

On 4/23/07, Ehud Karni [EMAIL PROTECTED] wrote:


On Mon, 23 Apr 2007 12:24:54 Geoffrey S. Mendelson wrote:

 On Mon, Apr 23, 2007 at 11:38:35AM Hadar wrote:

  I'm encoding audio CDs into FLAC files using Sound Juicer.
  The albums are automatically recognized and the songs names are
downloaded.
  When ripping Hebrew albums, the songs names are sometimes malformed
  (a.k.agibberish - certainly not Hebrew characters).

 They are Hebrew characters, but not in a charcter set you expect them to
 be.

I prefer the file name to be in ISO-8859-8 (8 bits) and not UTF-8.
Then I can see the Hebrew in Emacs and xterm, but not in Gnome or KDE
applications. I assume you get your Hebrew name in either ISO-8859-8
or in PC DOS (CP856).  I have 2 small shell scripts (below) that
converts ALL Hebrew names in a directory tree to ISO-8859-8 / UTF-8.



Thanks for the scripts!
If I understand you correctly,  trns-heb-2-utf.sh is what I need.
When I tried it on a directory, it simply wiped down the Hebrew characters.
Here's some debugging info:

+ for DFL in '*'
+ case $DFL in
++ echo '01 - çãùåú îäéøç.ogg'
++ tr '[€-š]' '[ת-א]'
++ iconv -fhebrew -tutf8
iconv: illegal input sequence at position 5

Looks like the input text is not iso-8859-8 (HEBREW) encoded.
I played with iconv a little but couldn't find the appropriate encoding.
Any suggestion?

Ehud.



#! /bin/sh -e
# Translate all file names in this directory tree to iso-8859-8
# -

chk_nm ()
{
echo \n\n now working on `pwd`

for DFL in *
do
case $DFL in
*�* )# 0xD7 is a sign for Hebrew
UTF-8
   NDFL=`echo $DFL | iconv -futf8 -thebrew`  ;;

  * )  # NOT UTF-8, DOS Hebrew is
0x80-0x9A
   NDFL=`echo $DFL | tr [€-š] [ת-א]`  ;;
esac
if [ $DFL != $NDFL ] ; then# name has changed ?
mv $DFL $NDFL  # rename
echo  $DFL == $NDFL # show to user
DFL=$NDFL# for recursive checking
fi

if [ ! -L $DFL -a -d $DFL ] ; then
( cd $DFL ; chk_nm )
fi
done
}

chk_nm # start scanning

## trns-utf-2-heb.sh##




#! /bin/sh -e
# Translate all file names in this directory tree to utf-8
# 

chk_nm ()
{
echo \n\n now working on `pwd`

for DFL in *
do
case $DFL in
*�* );;  # 0xD7 is a sign for Hebrew
UTF-8

* )# NOT UTF-8, DOS Hebrew is
0x80-0x9A
NDFL=`echo $DFL | tr [€-š] [ת-א] | iconv -fhebrew
-tutf8`
if [ $DFL != $NDFL ] ; then# had any Hebrew ?
mv $DFL $NDFL  # rename
echo  $DFL == $NDFL # show to user
DFL=$NDFL# for recursive
checking
fi  ;;
esac

if [ ! -L $DFL -a -d $DFL ] ; then
( cd $DFL ; chk_nm )
fi
done
}

chk_nm # start scanning

## trns-heb-2-utf.sh##



--
Ehud Karni   Tel: +972-3-7966-561  /\
Mivtach - Simon  Fax: +972-3-7966-667  \ /  ASCII Ribbon Campaign
Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
http://www.mvs.co.il  FAX:  1-815-5509341  / \
GnuPG: 98EA398D http://www.keyserver.net/Better Safe Than Sorry